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INTRODUCTION: 

There  is  a  paucity  of  information  regarding  markers/factors  associated  with  prostate 
cancer  (PCa)  outcome  in  the  United  States,  especially  how  these  factors  differ  among 
racial/ethnic  groups.  African-American  men  are  more  likely  to  have  poorer  outcome 
relative  to  age  and  stage-matched  Caucasian  patients;  and  very  little  is  known  about 
prognosis  and  even  less  about  factors  that  could  predict  progression  among  Hispanics. 
The  overall  goal  of  our  research  project  is  to  identify  molecular,  epidemiological  and 
clinical  markers  related  to  prostate  cancer  (PCa)  progression  in  a  multiethnic  cohort  of 
1,380  PCa  patients  (773  Caucasians;  361  African  Americans,  and  246  Mexican 
Americans). 

BODY: 

Task  1  Patient  follow-up.  (Months  1-30) 

a.  Update  patient  follow-up  data  by  checking  clinical  schedules  and  medical  charts 
for  updated  information.  Using  a  validated  medical  abstraction  form,  all  patient 
charts  will  be  abstracted. 

b.  Signed  medical  releases  of  information  will  be  requested  for  care  received  outside 
of  our  institution.  Copies  of  medical  records  will  be  requested. 

c.  Death  certificates  will  be  obtained  for  all  participants  identified  as  deceased. 

d.  Patients’  self-reported  recurrences  (and  subsequent  treatments)  and  secondary 
cancers  will  be  verified. 

e.  Data  will  be  entered  into  existing  databases. 

Institutional  patient  records  for  all  participant  have  been  abstracted  using  the 
standardized  form  attached  as  Appendix  A.  In  addition  to  baseline  treatment 
information,  we  abstracted  follow-up  information,  such  as  each  prostate  specific 
antigen  (PSA)  level  and  date,  adjuvant  care  received,  prostate-related  care  (including 
care  related  to  complications  following  treatment  (i.e.,  incontinence,  impotence)),  as 
well  as  additional  cancer  diagnoses.  Institutional  medical  records  were  available 
electronically,  and  abstractions  are  performed  using  a  paper  form  and  were  entered 
into  an  existing  clinical  database.  Institutional  patient  records  were  matched  by  the 
institutional  Tumor  Registry  to  determine  which  of  our  study  participants  had  a  return 
visit  to  the  institution  within  the  past  year,  and  the  most  recent  visit  was  abstracted 
and  the  medical  record  abstraction  was  updated  for  each  participant.  All  medical 
records  were  abstracted  using  the  standardized  form  attached  as  Appendix  A.  The 
most  recent  clinical  follow-up  date  at  our  institution  was  used  as  the  “last  date  of 
contact”  at  the  University  of  Texas  MD  Anderson  Cancer  Center  (UTMDACC). 

For  patients  for  whom  we  do  not  have  recent  follow-up  information  at  UTMDACC,  we 
conducted  telephone  interviews  to  request  follow-up  information.  We  utilized  several 
different  options  to  obtain  updated  contact  information  for  these  individuals,  including 
general  internet  searches,  reverse  address  searches,  and  credit  records.  The  Acxiom 
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Insight  Collection  service,  which  is  an  internet-based  paid  subscription  database,  was 
the  most  useful  tool  for  us.  Interviews  to  determine  the  current  health  of  patients  not 
returning  to  UTMDACC  were  conducted  by  telephone  following  a  standardized 
protocol  whereby  individuals  were  called  at  least  5  times  at  different  times  of  the  day, 
as  well  as  on  weekends  following  the  telephone  script  included  as  Appendix  B.  In 
addition,  when  the  call  attempts  were  not  successful,  we  sent  a  letter  to  the  patient  at 
the  last  known  valid  address  (with  address  correction  requested)  explaining  that  we 
are  trying  to  contact  them  regarding  their  follow-up  in  a  study  and  requesting  that  they 
contact  us  at  their  earliest  convenience.  Updated  health  and  risk  factor  information 
was  collected  by  trained  interviewers,  using  a  standardized  questionnaire  modified 
for  this  project  (Appendix  C).  Following  this  methodology,  we  have  an  average  of 
more  than  10  years  of  follow-up  for  the  patients  included  in  this  study. 

After  completing  the  Centers  for  Disease  Control  Institutional  Review  Board  (IRB) 
application,  we  obtained  approval  to  receive  vital  status  from  the  National  Death 
Index,  as  well  as  immediate  and  underlying  causes  of  death  for  deceased  individuals. 
We  have  linked  our  patient  database  to  the  NDI  and  updated  vital  status  for  all 
patients.  In  addition,  we  obtained  IRB  approval  from  the  Texas  Department  of  Health 
and  Vital  Statistics  to  link  our  patient  database  with  the  registered  deaths  in  Texas 
and  surrounding  states.  Date  of  death  as  well  as  cause  of  death  when  available  have 
been  recorded  in  the  study  database  for  all  known  decedents. 

Task  2  Evaluate  Constitutional  Markers  of  Genetic  Susceptibility.  (Months 
1-30) 

a.  Genotyping  assays  for  all  genes  will  be  established,  tested  and  validated 
by  the  Department  of  Epidemiology  Genotyping  Core  (Months  1-24). 

Genotyping  has  been  completed  in  the  Department  of  Epidemiology  Genotyping 
Core  using  the  lllumina  Infinium  II  Assay. 

b.  Biological  samples  for  all  participants  will  be  located  and  retrieved  from 
study  archive  freezers  (Months  1-3). 

Using  our  laboratory  tracking  database,  biological  samples  for  this  study  were 
identified,  located  and  retrieved  from  our  freezer  facility  and  transferred  to  the 
genotyping  facility. 

c.  DNA  will  be  extracted  from  banked  specimens  (Months  1-12). 

DNA  was  extracted  from  all  of  the  banked  specimens.  The  DNA  quality,  quantity 
and  purity  were  assessed  for  each  sample  to  increase  the  likelihood  of  success  of 
the  genotyping.  The  extracted  DNA  was  successfully  used  for  the  genotyping 
assays  performed  and  reported  below. 

d.  DNA  samples  will  be  plated  for  genotyping  analyses  -  half  the  samples  will 
be  done  in  Year  2  and  the  other  half  will  be  done  in  Year  3  (Months  13  & 
25) 

All  samples  have  been  quantified,  standardized,  plated,  and  submitted  for 
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genotyping. 

e.  Genotyping  will  be  done  for  half  the  samples  in  Year  2  (Months  1 3-24)  and 
the  other  half  in  Year  3  (Months  25-30). 

Initial  genotyping  was  done  using  the  proposed  methodology  for  611  cases  for 
MM  P-1,  615  for  e-cad herin,  433  for  beta-2-adrenergic  receptor,  and  725  for  cycl  in 
D1.  In  our  preliminary  analyses,  we  have  found  significant  differences  with 
respect  to  genotypic  frequency  between  racial/ethnic  groups  for  MM  P-1, 
beta-2-adrenergic  receptor  and  cyclin  D1.  However,  due  to  improvements  in 
technology  and  published  reports  in  recent  literature,  we  have  changed  our 
genotyping  methodology  to  utilize  the  lllumina  platform  for  the  final  genotyping 
analyses.  Using  the  lllumina  Infinium  II  platform,  we  genotyped  DNA  samples 
from  1275  patients  for  96  single  nucleotide  polymorphisms  (SNPs)  identified  by 
genome  wide  association  studies  and  validation  studies  to  play  a  role  in  PCa  risk. 
/4s  a  result  of  our  collaborations  with  several  multi-ethnic  consortiums  (led  by  Tim 
Rebbeck  at  University  of  Pennsylvania,  Brian  Henderson  at  the  University  of 
Southern  California,  and  Ros  Eeles  at  the  Institute  of  Cancer  Research  Royal 
Cancer  Hospital-London),  several  novel  SNPs  have  been  identified  to  play  a  role 
specifically  in  PCa  risk  among  African-Americans  (Appendix  E). 


Task  3  Final  Analysis  and  Preparation  of  Reports.  (Months  30-36) 

The  preliminary  results  of  this  study  were  presented  in  part  at  the  201 1 
Department  of  Defense  IMPACT  meeting.  We  have  completed  the  final  analyses 
of  the  data  as  proposed  regarding  the  associations  with  disease  progression  and 
advanced  stage  at  diagnosis,  and  are  preparing  to  submit  the  manuscripts  for 
consideration  of  publication.  In  addition,  the  results  from  the  African-American 
population  were  included  as  part  of  our  collaborations  with  the  multi-ethnic 
consortiums  (see  Appendix  E).  We  are  finalizing  the  analyses  and  preparing 
manuscripts  for  submission  regarding  the  associations  with  disease  progression 
and  advanced  stage  at  diagnosis. 


KEY  RESEARCH  ACCOMPLISHMENTS: 

We  found  that  different  combinations  of  PCa  susceptibility  loci  were  associated 
with  PCa  outcome  among  Whites  vs.  AAs,  and  these  loci  differed  between 
disease  progression  and  metastatic  at  diagnosis.  There  were  no  significant 
differences  between  Hispanic  and  non-Hispanic  Whites  with  respect  to  the 
associations  with  susceptibility  loci.  The  data  in  concert  with  our  previous  work  in 
PCa  risk  supports  the  finding  that  PCa  in  AAs  may  have  a  different  etiologic  basis 
vs.  in  Whites.  Future  research  will  focus  on  developing  models  to  evaluate  the 
role  of  these  loci  in  determining  subgroups  of  patients  that  may  benefit  from 
targeted  early  intervention  to  prevent  disease  progression/metastasis. 


-6- 


REPORTABLE  OUTCOMES: 

To  date  there  have  been  4  published  papers  (Appendix  E),  and  there  are  2 
others  currently  pending  review.  There  have  been  no  patents  or  licenses  applied 
for  based  on  this  award.  Additionally,  there  have  not  been  any  degrees 
supported  by  this  award;  no  cell  lines,  tissue  or  serum  repositories  developed; 
no  informatics  applied  for  based  on  work  from  this  award;  no  employment 
opportunities  applied  for  and/or  received  based  on  experience/training 
supported  by  this  award.  An  abstract  with  these  data  was  presented  at  the  201 1 
IMPaCT  meeting.  Preliminary  data  (numbers  of  participants  with  follow-up 
information)  have  been  included  in  2  funded  grant  proposals:  U01- 
Genome-wide  association  study  of  prostate  cancer  in  African  Americans 
(Henderson);  U19  -Trans-disciplinary  cancer  genomics  research:  post-GWA 
initiative  (Henderson/Eeles). 

CONCLUSION: 

Our  research  may  help  explain  ethnic/racial  disparities  in  PCa  progression  and 
provide  direction  towards  eliminating  these  disparities.  Additionally,  our  results 
may  guide  future  studies  to  develop  ethnic/racial  specific  interventions  (i.e., 
behavioral,  clinical)  to  improve  outcome  in  the  most  common  cancer  in  American 
men. 
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Medical  Records  Abstraction  Form 


Name  MDACC# _ 

_ MDA  registration  date  _ / _ / 


Address 


Date  of  birth  /  / 


Age  at  diagnosis _ years 

Phone  number _ 


Ethnicity 

□  White 

□  Mexican 

□  Hispanic 

□  Cuban 

-» 

□  S.  American 

□  African-American 

□  Other 

□  Asian 

□  Other 

Vital  status  □  Living  □  Deceased  — » Date  of  death  /  / 

Place  of  death _ 

Cause  of  death _ 

Last  date  of  contact  /  /  Place  of  contact 


Height: _ 

cm 

Weight: _ 

kg 

ft/inches 

_ lbs 

Prostate  cancer  diagnosis 


Date  of  diagnosis  / _ / _  Place  of  diagnosis:  MDACC  □  Yes 

□  No 


Diagnostic  tests  □  Biopsy 

□  POS 

□  NEG 

1 

□  TURP 

□  POS 

□  NEG  Where 

□  Chest  x-ray 

□  POS 

□  NEG 

□  Bone  scan 

□  POS 

□  NEG 

□  CT  scan 

□  POS 

□  NEG 

□  Other 

□  POS  □  NEG 

When  /  / 

Comments 
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Clinical  stage  of  diagnosis 

□  Organ  confined  disease 

□  Regional  disease 

□  Metastatic  disease  — >  date  of  confirmation  /  / 


Adrenal  gland  □  Kidney  □  Brain 


Sites:  □  Bones  □  Liver 


□  Other 


TNM  stage 

T1 

T4 

— »  □  X 

□  0 

□a 

□  b 

□  c  ^ 

N 

I  -»  □  X 

□  0 

□  1 

□  2 

□  3 

0 

— >  □  X 

□  0 

□  1 

Summary 

Comments 

T3 


□  a  □  b 


Laboratory  results 


Post-treatment  values 


Most  recent  post-treatment  PSA  value 

ng/ml 

Date _ 

_ /_ 

/ 

Follow-up  PSA 

Values 

ng/ml 

Date _ 

_ /_ 

/ 

Follow-up  PSA 

Values 

ng/ml 

Date _ 

_ /_ 

/ 

Follow-up  PSA 

Values 

ng/ml 

Date _ 

_ /_ 

/ 

Follow-up  PSA 

Values 

ng/ml 

Date _ 

_ /_ 

/ 

Follow-up  PSA 

Values 

ng/ml 

Date _ 

_ /_ 

/ 

Follow-up  PSA 

Values 

ng/ml 

Date _ 

_ /_ 

/ 

Follow-up  PSA 

Values 

ng/ml 

Date _ 

_ /_ 

/ 

Follow-up  PSA 

Values 

ng/ml 

Date _ 

_ /_ 

/ 

Initial  post-treatment  PSA  value 

ng/ml 

Date _ 

_ /_ 

/ 

Pre-treatment  values 

Flighest  pre-treatment  PSA  value 

ng/ml 

Date _ 

_/ _ 

/ 

Initial  pre -treatment  PSA  value 

ng/ml 

Date 

/ 

/ 
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Comments: 


Pathology  report 

Specimen  type 

o 

Pathology  report  #: 

□  Prostatectomy 

MDACC  grade  <= 

□  I 

□  11  □  III  □  IV 

□  other 

Seminal  Vesicle  involvement 

□Yes  DNo 

S/Margins  □  Positive  □  Negative 

Combined  Gleason  score 

□  2  □  3  □  4  □  5  □  6  □  7  □  8  0  9  DIO 


Dominant  focus  size  / size _ cm  Prostate  volume _ cm 

Tumor  locations  □  Peripheral  zone  □  Central  zone  □  Transitional  zone  □  AFM  zone 
Comments _ 

Pathology  report  Pathology  report  #: _ 

Specimen  type  <=  □  Biopsy 


MDACC  grade  <=  □! 

□  11 

□  m 

□  IV 

□  other 

Combined  Gleason  score 

□  2  D3  D4  D5 

□  6 

□  7 

□  8 

□  9  □  10 

Dominant  focus  size  /size 

cm 

Prostate  volume 

cm 

Tumor  locations  □  Peripheral 

zone 

□  Central  zone 

□  Transitional  zone 

□  AFM  zone 

Comments 

History  of  prostate  cancer  screening 

□  No 

□Yes  — »  Type  of  screening  test  □  Prostate-specific  antigen  (PSA) 

□  Digital  rectal  examination  (DRE) 

□  Trans-rectal  ultrasound  (TRUS) 

□  Other _ 

Presence  of  urinary  symptoms  □  Yes  □  No 
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Coments: 


Prostate  cancer  treatment  received 


□  Radical  prostatectomy  Type  — >  □  Radical  Retropubic  Prostatectomy  (RRP)  Date _ / _ /__ 

□  Radical  perineal  prostatectomy  (RPP) 


□  Nerve-sparing 
□  Pelvic  lymphadenectomy 


□  Orchiectomy 

— »  Date  / 

— 

□  Cryosurgery 

—> 

Date  /  / 

- 

Onset  of  treatment 

End  of  treatment 

□  Radiotherapy  (EBRT) 

-> 

Date  /  / 

Date  /  / 

□  Brachytherapy 

-» 

Date  /  / 

Date  /  / 

□  Hormonal  therapy 

Date  /  / 

Date  /  / 

□  Immunotherapy 

-> 

Date  /  / 

Date  /  / 

□  Surveillance 

-> 

Date  /  / 

Date  /  / 

□  Chemotherapy 

-» 

Date  /  / 

Date  /  / 

□  Other  (specify) 

Date  /  / 

Date  /  / 

Comments 


Complications  of  treatment 

Urinarv 

Incontinence  □  No 

□Yes  — » 

Uses  sanitary  pad 

□  No 
□Yes 

— »  number  /day 

Treatment  received 

Post-treatment  status  (lyr.) 

Number  of  pads/day  _ 

Date  /  / 

Impotence  □  No 

□  Yes  — » 

Post-treatment  status  (lyr.) 

Urinarv  retention  □  No 

Treatment  received 

□  Yes 

Treatment  received 
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Other 


Comorbid  conditions  prior  to  diagnosis  of  prostate  cancer 


□  No 


□  Yes 


□  Diabetes  (IDDM,  NIDDM) 

Date  of  diagnosis 

□  Elemorrhage 

Date  of  diagnosis 

□  Hypertension 

□  Peptic  ulcer  disease 

Date  of  diagnosis 
Date  of  diagnosis 

□  Congestive  heart  failure 

□  Pancreatitis 

Date  of  diagnosis 
Date  of  diagnosis 

□  Myocardial  infarction 

□  Cholelithiasis 

Date  of  diagnosis 
Date  of  diagnosis 

□  Stroke 

Date  of  diagnosis 

□  Alcoholism 

Date  of  diagnosis 

□  Chronic  obstructive  pulmonary  disease 

□  Lupus  erythematosus 

Date  of  diagnosis 
Date  of  diagnosis 

□  Other 

Date  of  diagnosis 

1 

_ /_ 

_/ _ 

_ /_ 

_/ _ 

/ 

/ 

_ /_ 

_/ _ 

/ 

/ 

_ /_ 

_/ _ 

/ 

/ 

_ /_ 

_/ _ 

_ /_ 

_/ _ 

_ /_ 

_/ _ 

/ 

/ 

_ /_ 

_/ _ 

_ /_ 

_/ _ 

Other  pertinent  information 


Recurrence  of  prostate  cancer 
□  Yes  -» 


□  No 

Date  of  diagnosis _ / _ / 


Place  of  diagnosis 


Basis  of  diagnosis 


Type  of  treatment 


Diagnostic  tests  □  Biopsy  DPOS  DNEG 

□  TURP  DPOS  DNEG 

□  Chest  x-ray  DPOS  DNEG 

□  Bone  scan  DPOS  DNEG 

□  CT  scan  DPOS  DNEG 

□  Other  DPOS  DNEG 
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Conditions  diagnosed  after  diagnosis  of  prostate  cancer 


APPENDIX  B: 


Follow-up  telephone  recruitment  script 

SCRIPT  1  (Speaking  to  person  who  answers  phone)  - 

Hello,  my  name  is  (INTERVIEWER’S  NAME)  and  I  am  calling  on  behalf  of  MD 
Anderson  Cancer  Center,  here  in  Houston.  May  I  please  speak  with  (PATIENT’S 
NAME)? 

>  NOT  AVAILABLE  -  Verify  (PATIENT’S  NAME)  lives  at  this  residence.  Ask  “Is  there  a  time 
that  I  could  call  back  and  speak  with  him?”  OR  “would  you  please  ask  him  to  call  me 
(INTERVIEWER’S  NAME)  at  (PHONE  NUMBER)  at  his  earliest  convenience?  Thank  you  for 
your  assistance. 

>  YES  -  Thank  you... (Wait  for  (PATIENT’S  NAME)  come  to  phone)  Hello,  my  name  is 
(INTERVIEWER’S  NAME)  and  I  am  calling  on  behalf  of  MD  Anderson  Cancer  Center,  here  in 
Houston.  You  participated  in  one  of  our  prostate  cancer  studies  a  few  years  ago,  and  we  are 
conducting  a  follow-up  study  to  see  how  you  are  doing.  Would  it  be  all  right  with  you  if  I 
asked  you  a  few  questions  about  your  health  and  updated  your  information? 

•  NO  -  thank  you  for  your  time.  If  you  change  your  mind  and  would  like  to  participate, 
please  contact  me  (INTERVIEWER’S  NAME)  at  (PHONE  NUMBER). 

•  YES  -  I  want  to  let  you  know  that  answering  these  questions  is  completely  voluntary,  and 
you  may  decide  not  to  answer  any  or  all  of  them.  (Administer  risk  factor  questionnaire 
(Appendix  D)) 

Following  each  call,  the  interviewer  logs  each  call  made  onto  the  tracking  log  for  each  file,  documenting 
the  date,  time,  phone  number  dialed,  and  with  whom  they  spoke.  These  logs  are  maintained  in  the 
individual  patient’s  study  chart,  kept  in  a  locked  office  coded  by  study  identification  number. 
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APPENDIX  C: 

Follow-Up  questionnaire 
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PROSTATE  CANCER  FOLLOW-UP  STUDY 


r 


\ 


M.D.A 


M.D.  Anderson  Cancer  Center 


STUDY  NUMBER: 


Department  of  Epidemiolosv 


Date  of  PC  Diagnosis: 


/  / 


/ 


Med  Record/Patient#: 


Date  of  Baseline  Interview:  / _ / 


Patient  receiving  Follow-up  Care  At  MDACC: _ (1)YES  Date  of  Most  Recent  MDACC  Visit:  /  / 

_ (2)  NO 


First  Name  M.l. 

Street  Address 
City  State 

Interview  Date: _ / _ / 


_  HOME  PHONE:  ( _ ). 

Last  Name 

_  WORK  PHONE:  ( _ ) 

_  SSN: _ 

Zip  Code 

Interviewer’s  Initials: 


Who  is  Completing  Questionnaire?  O  PATIENT  O  PROXY 

If  Patient  is  Deceased,  Date  of  Death _  County  &  State  of  Death 
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As  you  may  remember,  you  participated  in  a  study  of  prostate  cancer.  We  are  currently  updating  our  information,  and  we  wanted  to  see  how 
you  are  doing.  Do  you  have  a  few  moments  to  talk  to  me  now  or  when  can  I  call  you  back? 

1 .  Are  you  currently  being  followed-up  for  your  previous  prostate  cancer?  _ YES  (1 )  _ NO  (2) 


2.  Where  are/were  you  receiving  follow-up  care? _ 

3.  When  was  your  most  recent  follow-up  visit? _ (Date) 

When  was  the  last  time  you  had  (the  following  test(s))?  What  were  the  results? 


Test 

Most  Recent  Date 

Result  (most  recent) 

4.  Prostate  Specific  Antigen/ 

(PSA) 

Normal  (1)  go  to  Q.8 
Abnormal  (2)  go  to  Q.5 

5.  Ultrasound  (TRUS) 

6.  Biopsy  or  Transurethral 
Resection  of  Prostate  (TURP) 

7.  Other  (specify) 

8.  Have  you  received  any  prostate  treatment  since  you  were  last  seen  at  MD  Anderson/Kelsey-Seyboldt/VAMC/Dr. 

_ (select  provider)  in _ (fill  in  last  date)? 

_ (2)  NO _ 


(1  )YES 


Skip  to  Q.  12 


9.  When  and  where  were/are  you  receiving  treatment?  ( e.g .,  M D/Clinic  Name,  Address,  Phone  #) 


Office  Note:  Obtain  signed  medical 
release  of  information 
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10.  What  type(s)  of  treatment  did  you  receive?  (e.g.,  radiation,  hormone  shots,  hormone  pills,  chemotherapy) 


11.  Why  was  the  treatment  necessary? 


Have  you  ever  been  told  by  a  doctor  or  another  health  care  professional  that  you  have  any  of  the  following  conditions? 


Condition 

Been  told? 

Date/Age 

Diagnosed 

Treatment/Medication  Name 

12.  Diabetes  (or  sugar  in  urine) 

(1)  YES 

(2)  NO 

13.  Hypertension  (high  blood  pressure) 

(1)  YES 

(2)  NO 

14.  Angina  (angina  pectoris) 

(1)  YES 

(2)  NO 
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15.  Heart  attack  (myocardial  infarction) 

(1)  YES 

(2)  NO 

16.  Any  other  kind  of  heart  condition  or 
disease  (not  mentioned  above) 

SPECIFY: 

(1)  YES 

(2)  NO 

Condition 

Been  told? 

Date/Age 

Diagnosed 

Treatment/Medication  Name 

17.  High  cholesterol 

(1)  YES 

(2)  NO 

18.  Arthritis  TYPE: 

(1)  YES 

(2)  NO 

19.  Any  other  cancer(s)?  SPECIFY 

(1)  YES 

(2)  NO 

20.  Any  other  condition(s)?  SPECIFY 

(1)  YES 

(2)  NO 

Previous  Smoking  Status 

Current  Former  Never 


TOBACCO 

The  next  questions  are  about  smoking. 
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21 .  Since  your  prostate  cancer  diagnosis,  has  your  smoking  status  changed? 

22.  Are  you  currently  smoking  cigarettes?  _ (1  )YES _ (2)  NO - 

23.  On  average,  how  many  cigarettes  per  day  do  you/did  you  smoke?  _ 
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MEDICATION/SUPPLEMENT  USE 


The  next  questions  are  medications  and  supplement  use  |j 

24.  Have  you  taken  any  supplement,  7575T TO 7 EOTTST TTOHTCSTinH? 75T pTSSCnprafi  TTCHlCSTIT5nr§t  least  once  a  month  since  your 
diagnosis?  This  would  include  all  vitamins,  minerals,  herbal  and  non-herbal  supplements  of  any  kind. 

_ (2)  No,  GO  TOO.  26 

_ (1)  Yes,  Fairly  regularly  _  (3)Yes,  but  NOT  regularly 


25.  Please  list  the  names  of  any  supplements  (including  vitamins,  minerals  and  herbal  supplements),  over-the-counter  medications  or 
prescription  medications  that  you  have  taken.  Also  include  the  number  of  pills  or  tablets  taken  daily,  weekly,  monthly  or  yearly? 


For  Office 
Use: 

code 

Supplement, 
Over-the-counter  or 
prescription  medication 

Number 

per 

Day 

Number 

per 

Week 

Number 

per 

Month 

Number 

per 

Year 

Rarely  / 
Never 

K) 

How  many 
years? 

Dose 

Brand: 

Name  on  bottle: 

Brand: 

Name  on  bottle: 

Brand: 

Name  on  bottle: 
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Brand: _ 

Name  on  bottle: 


DIET 


The  following  questions  are  regarding  diet  changes 
Since  your  diagnosis,  have  you  changed  your  consumption  of  the  following  types  of  foods? 


FOOD  TYPE 

INCREASED 

26.  Fat 

(1)  increased 

(2)  decreased 

(3)  no  change 

27.  Fruits 

(1)  increased 

(2)  decreased 

(3)  no  change 

28.  Vegetables 

(1)  increased 

(2)  decreased 

(3)  no  change 

29.  Fiber 

(1)  increased 

(2)  decreased 

(3)  no  change 

30.  Soy  products 

(1)  increased 

(2)  decreased 

(3)  no  change 
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31 .  Are  there  any  comments  that  you  would  like  to  add  about  your  diet  or  about  the  way  you  have  changed  your  diet? 


FAMILY  HISTORY 


In  this  section,  I  would  like  to  ask  you  some  questions  about  your  family 


FAMILY  HISTORY  PRE-CODE: 

Previously  reported  family  members  WITH  cancer: _ _ 


Sex 

Relative 

Side  of 
Family 

Type  of  Cancer 

Sex 

Relative 

Side  of 
Family 

Type  of  Cancer 

32.  Previously,  you  told  us  that  your 


(insert  previous  history  here)  had  cancer,  have  any  other  immediate  family 


members  been  diagnosed  with  cancer?  _ YES  (1) 


NO  (2) 


Go  to  Q.  34 


33.  Would  you  please  give  us  some  information  about  these  NEW  family  members  diagnosed  with  cancer?  (DON’T  include  those 
previously  reported) 


Rel 

Sex 

Relative 

Rel 

When  was 

What  kind  of  cancer? 

When  was  he/ 

Is  he/she 

When  did 

Code 

UIN 

he/she  born? 

ICD-9 

she  diagnosed? 

still  living? 

he/she  die? 

(1)  Yes 

(2)  No 

(1)  Yes 

(2)  No 

-24- 


OCCUPATIONAL  HISTORY 


!HB 

34. 


In  this  section,  I  would  like  to  ask  you  some  questions  about  your  current  occupation 


What  is  your  job  or 
occupation? 

Years 

employed 

Major  duties 

Equipment  used 
(Any  Chemicals?) 

Work  done 
by  company 

SIC 

OCC 

Current  Job: 

To 

Spec 
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If  we  need  additional  information  from  you  in  the  future,  can  we  contact  you  by  telephone?  _ (1  )YES  _ (2)NO 


This  is  the  end  of  our  interview.  I  would  like  to  thank  you  for  your  help  with  our  research.  If  you  have  any  questions  that  I  or  Dr.  Strom 
can  answer  in  the  future,  please  feel  free  to  contact  us.  We  would  also  like  to  verify  that  we  have  your  current  address  correctly 
recorded.  We  have  your  current  address  as:  READ  ADDRESS  FROM  FILE  RECORD 

Is  this  address  correct?  _ (1 )  YES  _ (2)NO  (If  NO,  please  provide  correct  information  below) 


First  Name 


Middle  Name 


Last  Name 


Street  Address 


City 


State 


Zip  Code 


Also,  so  that  we  may  keep  contact  with  you,  would  you  please  give  me  that  name,  address,  and  telephone  number  of  a  person  who  does  not  live  with  you  who 
will  know  your  whereabouts  in  the  future: 


First  Name 


Middle  Name 


Last  Name 


Street  Address 


City 


State 


Zip  Code 
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Thank  you  once  again  for  your  time  and  help  with  our  research  project.  If  we  have  any  more  questions  in  the  future,  we  hope  we  can 
call  you  again. 
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INTERVIEW  ASSESSMENT 


Date  of  interview:  /  / 

Time  Interview  began: _ 

1 .  Respondent’s  cooperation  was: 

_ Very  Good  (1 ) 

_ Good  (2) 

_ Fair  (3) 

_ Poor  (4) 

2.  The  quality  of  the  interview  was: 

_ Highly  Reliable  (1 ) 

_ Generally  Reliable  (2) 

_ Questionable  (3) 

_ Unsatisfactory  (4) 


Interviewer’s  Initials: 
Time  Interview  ended 


Please  write  comments  about  the  interview: 


APPENDIX  D: 

Medical  release  of  information  form 
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AUTHORIZATION  FOR  DISCLOSURE  OF  HEALTH 

INFORMATION 


(1)  I  hereby  authorize  _  to  disclose  the 

following  information  from  the  health  records  of: 

Patient  Name: _ 

Last  First  MI.  Date  of  Birth 

MDA  # 

Address: _ 


Street 

Phone 

covering  the  period  of  healthcare  from 


City 


State 

to _ 


Zip  Code 


(2)  Information  to  be  disclosed: 

□  Complete  Health  Record 

□  Primary  Medical  Evaluation 

□  Progress  Notes 

□  X-Ray  Reports 

□  Discharge  Summary 


□  Consultation  Reports 

□  Laboratory  Tests 

□  Radiotherapy  Notes 

□  Chemotherapy  Notes 

□  Nurse's  Notes 


□  Other  (specify)  _ 

I  understand  that  this  will  include  infonnation  relating  to  (check  if  applicable): 

□  Acquired  Immunodeficiency  Syndrome  (AIDS)  or  infection  with  HIV  (Human  Immunodeficiency 
Virus) 

□  Psychiatric  care 

□  Treatment  for  alcohol  and/or  drug  abuse 


(3)  This  information  is  to  be  disclosed  to:  Dr.  Sara  Strom 


Investigator’s  signature 

UT  MD  Anderson  Cancer  Center 
1515  Holcombe,  Houston,  Texas  77030 

for  the  purpose  of:  Medical  Record  completion  for  research  protocol  M91-004. 

(4)  I  understand  this  authorization  may  be  revoked  in  writing  at  any  time,  except  to  the  extent  that  action 
has  been  taken  in  reliance  on  this  authorization.  Unless  otherwise  evoked,  this  authorization  will 
expire  on  the  following  date,  event,  or  condition: 
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(5)  The  facility,  its  employees,  officers,  and  physicians  are  hereby  released  from  any  legal  responsibility 
or  liability  for  disclosure  of  the  above  information  to  the  extent  indicated  and  authorized  herein. 

Signed: _ 

(patient)  (date) 


or 


(Legal  Representative)(Relationship  to  Patient)  (date) 


SUPPORTING  DATA:  N/A 
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Abstract 

Background:  Genome-wide  association  studies  (GWAS)  have  identified  numerous  prostate  cancer  sus¬ 
ceptibility  alleles,  but  these  loci  have  been  identified  primarily  in  men  of  European  descent.  There  is  limited 
information  about  the  role  of  these  loci  in  men  of  African  descent. 

Methods:  We  identified  7,788  prostate  cancer  cases  and  controls  with  genotype  data  for  47  GWAS- 
identified  loci. 

Results:  We  identified  significant  associations  for  SNP  rsl0486567  at  JAZP1,  rsl0993994  at  MSMB, 
rsl2418451  and  rs7931342  at  llql3,  and  rs5945572  and  rs5945619  at  NUDT10/11.  These  associations  were 
in  the  same  direction  and  of  similar  magnitude  as  those  reported  in  men  of  European  descent.  Significance 
was  attained  at  all  reported  prostate  cancer  susceptibility  regions  at  chromosome  8q24,  including  associations 
reaching  genome-wide  significance  in  region  2. 

Conclusion:  We  have  validated  in  men  of  African  descent  the  associations  at  some,  but  not  all,  prostate 
cancer  susceptibility  loci  originally  identified  in  European  descent  populations.  This  may  be  due  to  the 
heterogeneity  in  genetic  etiology  or  in  the  pattern  of  genetic  variation  across  populations. 

Impact:  The  genetic  etiology  of  prostate  cancer  in  men  of  African  descent  differs  from  that  of  men  of 
European  descent.  Cancer  Epidemiol  Biomarkers  Prev;  20(1) ;  23-32.  ©2011  AACR. 


Introduction 

The  differences  in  prostate  cancer  incidence  and  mor¬ 
tality  across  men  of  different  racial  groups  are  well 
documented.  According  to  SEER,  prostate  cancer  has 
an  age-adjusted  incidence  rate  of  234.6  per  100,000  in 
African  American  and  150.4  per  100,000  in  European 
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American  men.  In  addition,  a  2.4-fold  difference  in  mor¬ 
tality  rate  (62.3  per  100,000  in  African  Americans  vs.  25.6 
per  100,000  in  European  Americans)  represents  the  great¬ 
est  disparity  between  these  groups  of  any  major  cancer 
site.  Despite  this  profound  public  health  concern,  knowl¬ 
edge  of  the  etiologic  underpinnings  for  this  disparity 
remains  unclear.  It  is  likely  that  inherited  susceptibility, 
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environmental  exposures,  lifestyle,  behavior,  screening, 
and  cancer  treatment  all  influence  the  disparity  between 
men  of  different  racial  and  ethnic  backgrounds. 

A  number  of  recent  genome-wide  association  studies 
(GW AS)  have  identified  numerous  prostate  cancer  sus¬ 
ceptibility  loci  including  CTBP2  (chr.  10q26),  EHPB1  (chr. 
2pl5),  HNF1B  (chr.  17ql2),  IGF2/IGF2A/INS  (chr.  Ilpl5), 
1TGA6  (chr.  2p31),  KLK2/3  (chr.  19ql3),  LMTK2  (chr. 
7q21),  MSMB  (chr.  10qll),  NKX3.1  (chr.  8p21), 
NUDT10/11  (chr.  Xpll.22),  PDLIM5  (chr.  4q22),  SELB 
(chr.  3q21.3),  SLC22A3  (chr.  6q25),  TET2  (chr.  4q24), 
THADA  (chr.  2p21),  TTLL1/BIK/MCAT/PACSIN2  (chr. 
22ql3),  as  well  as  loci  on  chromosome  llql3,  17ql2, 
17q24,  and  multiple  regions  at  chromosome  8q24  (1- 

17) .  These  loci  were  discovered  primarily  in  European 
descent  men  (EDM),  with  the  exception  being  the  pros¬ 
tate  cancer  susceptibility  loci  at  chromosome  8q24,  which 
were  identified  by  linkage  and  admixture  mapping  (15, 

18) .  Studies  suggest  that  some  genetic  variants  confer  risk 
across  populations  but  with  different  magnitudes  of  the 
risk  in  different  populations,  or  they  may  only  confer  risk 
in  one  population  but  not  in  others  (11,  19).  Because  the 
prevalence  of  prostate  cancer  and  the  allele  frequencies 
differ  between  EDM  and  African  descent  men  (ADM),  it 
is  important  to  estimate  the  effects  of  these  GW  AS  risk 
variants  originally  identified  in  EDM  on  ADM  before 
generalization  of  the  GW  AS  associations  in  ADM.  Three 
recent  studies  have  also  attempted  to  validate  associa¬ 
tions  between  some  of  the  loci  listed  above  and  prostate 
cancer  in  ADM.  Xu  et  al.  (20)  studied  868  cases  and  878 
controls  and  validated  the  loci  at  8q24  ( P  =  0.034  -P  =  2  x 
10  5)  and  3pl2  ( P  =  0.029).  Waters  et  al.  (19)  studied  860 
cases  and  575  controls  and  validated  KLK2/3  (19ql3.33) 
and  NUDT10/11  (Xpll.22).  Finally,  Hooker  et  al.  (21) 
validated  8q24  (P  =  1  x  10  4),  llql3.2  (P  =  0.009), 
HNF1B/TCF2  (17ql2;  P  =  0.008),  KLK2/3  (19ql3.33;  P  = 
0.04),  and  NUDT11  (Xpll.22;  P  =  0.05)  in  454  cases  and 
301  controls.  The  validated  loci  were  not  consistent  across 
these  studies,  perhaps  due  to  relatively  small  sample 
sizes  in  each  study.  To  confirm  associations  at  previously 
identified  prostate  cancer  susceptibility  loci  in  ADM,  we 
obtained  data  from  7,788  ADM  from  19  centers  in  the 
United  States  and  the  United  Kingdom  for  pooled  ana¬ 
lyses  of  GW AS-identified  loci  and  prostate  cancer. 

Methods 

Study  sample 

The  sample  studied  here  consisted  of  4,040  cases  and 
3,748  controls  ascertained  from  19  centers  (Supplemen¬ 
tary  Table  1).  A  detailed  description  of  each  center's 
study  is  presented  in  Appendix  1  and  a  summary  of 
the  study  methods  is  presented  in  Supplementary  Table  5. 
These  studies  include  the  Prostate  Cancer  Genetics  Stu¬ 
dies  (CaP  Genes)  at  the  University  of  California  (22),  Fred 
Hutchinson  Cancer  Research  Center  (FHCRC)  Prostate 
Cancer  Studies  (23,  24),  The  Prostate  Risk  Assessment 
Program  (PRAP)  at  Fox  Chase  Cancer  Center  (25),  The 


Flint  Men's  Health  Study  (FMHS;  refs.  26,  27),  Gene- 
Environment  Interaction  in  Prostate  Cancer  (GECAP) 
Study  at  Henry  Ford  Hospital  (28),  Los  Angeles  County 
Study  (LACS;  ref.  29),  Prostate  Cancer  Clinical  Outcome 
Study  (PC2OS)  at  the  University  of  Louisville  (30),  MD 
Anderson  Cancer  Center  (31),  The  Multiethnic  Cohort 
Study  (MEC;  ref.  32),  Moffitt  Cancer  Center  Study  (33), 
NCI  Prostate  Tissue  Study  (NCIPTS),  University  of  Penn¬ 
sylvania  Study  of  Cancer  Outcomes,  Risk,  and  Ethnicity 
(SCORE;  ref.  34),  University  of  Texas  San  Antonio  Center 
for  Biomarkers  of  Risk  for  Prostate  Cancer  (SABOR), 
University  of  Texas  Health  Science  Center  at  San  Antonio 
(35,  36),  San  Francisco  Bay  Area  Prostate  Cancer  Study 
(SFBAPCS;  ref.  37),  United  Kingdom  Genetic  Prostate 
Cancer  Study  (UKGPCS),  Wake  University  Consortium 
including  participants  from  the  Johns  Hopkins  Univer¬ 
sity,  Wake  Forest  University,  and  Washington  University 
(20).  Two  of  these  studies,  SFBAPCS  and  UKGPCS,  have 
contributed  only  to  case-case  analyses  of  disease  aggres¬ 
siveness  because  only  cases  were  available  from  these  2 
studies.  Single-nucleotide  polymorphism  (SNP)  were 
chosen  if  they  were  implicated  in  previous  GW  AS  studies 
(1-3,  38),  in  follow-up  fine-mapping  studies  (5-7,  39,  40), 
or  associated  with  disease  aggressiveness  (4,  41).  Avail¬ 
able  SNPs  in  all  regions  of  8q24,  some  of  which  were 
initially  identified  through  linkage  and  admixture  map¬ 
ping  in  ADM  and  confirmed  in  GW  AS  studies,  were  also 
included  (10,  11,  14-16,  42). 

Genotype  data  were  excluded  if  they  were  found  to 
have  genotyping  failure  rates  greater  than  5%  within  each 
study  center  or  if  they  deviated  significantly  from  Hardy- 
Weinberg  proportions.  We  set  a  threshold  of  P  <  0.001 
based  on  multiple-test  adjustment  for  the  number  of 
SNPs  tested  (family-wise  error  rate  P  =  0.05  divided 
by  50  SNPs  equals  to  P  =  0.001).  SNPs  were  included 
in  the  present  analysis  if  we  obtained  at  least  1,000 
genotypes  in  cases  and  controls  from  the  contributing 
centers  by  October  2009.  A  summary  of  the  data  con¬ 
tributed  by  each  center  by  SNP  is  summarized  in  Sup¬ 
plementary  Table  6. 

Statistical  methods 

Departure  from  Hardy-Weinberg  equilibrium  was 
assessed  for  each  SNP  in  control  subjects  of  the  combined 
study  populations  using  the  chi-square  goodness-of-fit 
Test.  Any  SNP  that  showed  departure  from  Hardy- 
Weinberg  equilibrium  with  P  <  0.001  in  controls  was 
excluded  from  subsequent  analyses.  Unconditional  logis¬ 
tic  regression  models  were  used  to  estimate  odds  ratios 
(OR)  and  95%  CIs  to  measure  the  association  between 
individual  SNP  genotypes  and  prostate  cancer  risk  or 
disease  aggressiveness  defined  as  Gleason  score  <7  ver¬ 
sus  7+  or  tumor  stage  T1/T2  versus  T3/T4.  Analyses 
were  undertaken  using  an  additive  mode  of  inheritance, 
adjusting  for  age  and  study  centers  (results  shown  in 
Table  1  and  Supplementary  Tables  2-4). 

Subgroup  analyses  were  also  carried  out  to  estimate 
whether  African  ancestry  affected  the  reported 
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associations.  This  analysis  included  a  subset  of  study 
centers  for  which  estimated  percentage  of  African  ances¬ 
try  was  available  (Supplemental  Table  5).  Centers  used 
different  ancestry  informative  marker  (AIM)  panels 
(Supplementary  Table  5).  These  AIMs  were  obtained 
from  the  original  genotyping  methods  used  in  each 
center,  and  were  comparable  on  the  basis  of  several 
measures  of  marker  informativeness  (FST,  FIC,  and  S). 
The  statistical  methods  used  to  estimate  ancestry  propor¬ 
tion,  STRUCTURE  and  ANCESTRYMAP,  have  used 
same  hierarchical  model  and  probabilistic  measures 
and  would  result  in  similar/high  correlated  measure¬ 
ments.  In  addition,  we  analyzed  data  stratifying  by  center 
to  adjust  for  potential  confounding  by  ancestry  propor¬ 
tion  within  each  participating  study  and  to  minimize  the 
influence  of  varying  informativeness  of  AIM  panels. 

These  studies  include  nested  case-control  studies  from 
within  cohorts,  matched  and  unmatched  case-control 
studies,  as  well  as  case-only  series.  To  address  the  poten¬ 
tial  study  heterogeneity,  age-adjusted  ORs  and  95%  CIs 
for  SNPs  were  estimated  for  each  study  population  sepa¬ 
rately,  and  forest  plots  were  generated  for  independent 
SNPs  with  P  values  <  0.05  (Supplementary  Fig.  1).  Poten¬ 
tial  heterogeneity  in  the  association  of  SNPs  with  prostate 
cancer  among  study  populations  was  examined  by  Bre- 
slow-Day  homogeneity  test.  All  statistical  analyses  were 
performed  using  SAS  9.2  and  PLINK  (43).  An  LD  heat 
map  (Fig.  1)  was  generated  on  the  basis  of  HapMap  YRI 
data  using  Haploview  (44).  Inferences  were  made  using 
2-sided  hypothesis  testing  with  a  P  value  <  0.05.  Because 
this  is  a  validation  study,  we  did  not  correct  for  multiple 
hypothesis  tests. 

Results 

We  were  able  to  validate  some,  but  not  all,  prostate 
cancer  GWAS  loci  (Table  1  for  SNPs  outside  of  8q24 
regions,  and  Supplementary  Table  2  and  Figure  1  for 
SNPs  located  within  8q24).  Most  associations  reported 
here  were  in  the  same  direction  and  with  an  equal  or 
smaller  magnitude  as  those  originally  reported  in  EDM. 
However,  a  number  of  associations  reported  here  were 
not  in  the  same  direction  as  those  reported  in  EDM  (i.e., 
CTBP2,  llql3,  and  22ql3;  Table  1),  suggesting  that  these 
alleles  are  not  consistent  with  prostate  cancer  risk  in 
ADM.  A  number  of  loci  that  were  implicated  in  EDM 
were  not  associated  with  prostate  cancer  risk  in  ADM. 
These  included  CTBP2  (rs4962416),  llql3  (rsl2418451), 
IL16  (rs4072111),  CDH13  (rs4782726),  and  22ql3 
(rs9623117)  with  OR  <  1  (i.e.,  in  the  opposite  direction 
from  that  reported  in  EDM),  and  EHBP1  (rs721048), 
LMTK2  (rs6465657),  MINPP1  (rsl2771728).  Chromosome 
12  (rs902774),  and  KLK2/3  (rs887391)  with  OR  near  1.0. 
Furthermore,  the  upper  bound  of  the  95%  Cl  for  a  number 
of  loci  in  ADM  did  not  overlap  at  least  earlier  estimates 
made  in  EDM,  including  3pl2.1  (rs2660753),  DAB2IP 
(rsl571801),  MSMB  (rsl0993994),  CTBP2  (rs4962416), 
HNF1B  (rs4430796  and  rs7501939),  KLK2/3  (rs2735839), 


22ql3  (rs96231 1 7),  and  NUDT10/U  (rs5945572  and 
rs5945619).  These  results  suggest  that  some  loci  with 
genome-wide  significance  in  non-African  descent  popu¬ 
lations  may  not  be  associated  with  prostate  cancer  or  may 
not  have  the  same  magnitude  of  effect  in  ADM. 

Several  SNPs  showed  statistically  significant  associa¬ 
tions.  SNPs  in  JAZF1  (rsl0486567;  OR  =  1.18;  P  =  0.0002), 
MSMB  (rsl 0993994;  OR  =  1.12;  P  =  0.005),  llql3 
(rsl0896449;  OR  =  1.12;  P  =  0.031  and  rs7931342;  OR  = 
1.15;P  =  0.014)  andNUDTlO/11  (rs5945572;OR=  1.11;P  = 
0.02  and  rs5945619;  OR  =  1 .09;  P  =  0.039)  were  statistically 
significantly  associated  with  prostate  cancer  risk.  The 
direction  of  effect  of  each  of  these  associations  was  in 
the  same  direction  as  those  reported  in  EDM  (Table  1). 

We  also  undertook  a  similar  analysis  that  excluded 
data  that  have  been  published  previously  to  isolate  a 
subset  of  study  centers  for  evaluating  further  evidence  of 
independent  replications  (19,  20).  After  excluding  data 
from  those  studies  (i.e.,  JHU,  MEC,  Wake-Hu,  Wake-NC, 
and  Wash  U),  both  JAZF1  rsl0486567  (P  =  0.005)  and 
MSMB  rsl0993994  (P  =  0.009)  remained  statistically  sig¬ 
nificant.  In  both  cases,  the  OR  estimate  in  the  subset 
trended  away  from  the  null  hypothesis  (OR  =  1.23  in 
the  subset  vs.  1.18  in  the  total  sample,  and  OR  =  1.17  in 
the  subset  vs.  1.12  for  the  total  sample,  respectively).  SNP 
rsl0896449  at  llql3  stayed  nominally  significant  (P  = 
0.02),  but  SNPs  at  NUDT10/11  and  SNP  rs7931342  at 
llql3  were  no  longer  significant.  These  results  further 
provide  support  for  the  association  of  JAZF1  and  MSMB 
with  prostate  cancer  risk  in  ADM.  Although  we  were 
unable  to  mutually  adjust  for  the  effects  of  multiple  SNPs 
in  a  single  locus  for  the  majority  of  loci,  after  mutually 
adjusting  for  multiple  SNPs  at  llql3,  both  SNP  rs7931342 
(OR  =  1.0;  94%  Cl:  0.77-1.30;  P  =  0.999)  and  rsl0896449 
(OR  =  1.18;  95%  Cl:  0.93-1.49;  P  =  0.17)  became  non¬ 
significant.  As  the  sample  size  for  this  last  analysis  is 
smaller  than  for  the  overall  sample  (i.e.,  n  =  2,013  vs.  n  = 
3,954  or  4,463),  we  were  not  able  to  unambiguously 
determine  which  SNP  contributed  independently  to  the 
association  signal  seen  at  this  locus.  After  mutual  adjust¬ 
ment,  the  point  estimates  for  rs7931342  changed  from  1.15 
to  1.0  and  rsl0896449  changed  from  1.12  to  1.18.  These 
results  suggest  that  rsl0896449  or  other  SNPs  in  tight  LD 
with  rsl0896449  maybe  the  SNP  that  contributes  to  the 
association  signal  at  llql3  locus.  Multiple  independent 
loci  on  chromosome  8q24  have  been  identified  as  playing 
a  role  in  prostate  cancer  etiology.  We  were  able  to  validate 
the  association  of  each  of  these  regions  at  8q24  (Fig.  1  and 
Supplementary  Table  2).  We  had  statistically  significant 
evidence  at  the  genome-wide  association  level  for  asso¬ 
ciations  with  regions  2  (rsl3254738,  rs6983561,  and 
rsl6901979),  and  statistically  significant  associations  in 
region  1  (rsl0090154),  region  3  (rs6983267  and  rs7000448), 
region  4  (rs7008482),  and  the  region  centromeric  to  region 

2  (rsl0086908).  We  also  removed  data  that  had  been 
included  in  previous  studies  (19,  20,  45)  of  loci  at  8q24. 
Significant  associations  remained  for  Regions  2  (block  2), 

3  (block  4),  4  (centromeric  to  block  1),  and  the  region 
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Figure  1.  Results  of  prostate  cancer  associations  at  8q24  in  ADM.  P  values  for  association  by  genomic  location. 


centromeric  to  Region  2  (block  1).  However,  the  marginal 
associations  in  region  1  (block  5)  were  no  longer  signifi¬ 
cant  after  the  data  from  the  published  reports  were 
excluded. 

Because  we  have  studied  an  admixed  population  of 
ADM,  we  also  investigated  potential  bias  due  to  popula¬ 
tion  stratification  by  comparing  the  association  results 


with  or  without  adjusting  for  percentage  of  non- African 
ancestry  estimated  from  AIMs.  Ancestry  adjustment  ana¬ 
lyses  were  undertaken  in  8  of  the  19  centers  for  which 
AIMs  data  were  available  (Supplementary  Table  3).  We 
observed  significant  differences  in  the  proportion  of 
African  ancestry  across  centers  (Xy,  Kruskal-Wallis  = 
339.6;  P  <  0.0001).  However,  these  differences  may  reflect 
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not  only  known  geographic  differences  in  ADM  admix¬ 
ture  (46),  but  also  the  different  ancestry  marker  panels 
and  methods  used  to  estimate  the  ancestry  proportions 
across  centers  (Supplementary  Table  5).  Therefore,  we 
have  performed  all  analyses  with  adjustment  for  center 
effects  to  reduce  the  impact  of  different  ancestry  marker 
panels  and  methods  used  across  centers.  Among  those 
centers  with  ancestry  marker  data,  inclusion  of  percent 
non-African  ancestry  did  not  substantially  change  the 
associations  or  inferences  for  any  locus  compared  with 
models  adjusted  only  for  age  and  center. 

We  also  evaluated  the  effect  of  the  GW  AS  SNPs  studied 
here  on  prostate  cancer  aggressiveness  by  repeating  the 
analysis  with  stratification  by  clinical  (TNM)  stage  and 
histologic  (Gleason)  grade  (Supplementary  Table  4).  For 
SNPs  that  showed  a  significant  association  in  the  com¬ 
parisons  of  both  high  grade/ stage  against  controls  and 
low  grade/ stage  against  controls,  there  were  no  statisti¬ 
cally  significant  differences  between  high-  and  low- 
grade/  stage  cases.  A  number  of  loci  were  associated  with 
disease  aggressiveness,  but  in  no  instance  was  there  an 
evidence  for  statistically  significant  differences  in  the 
associations  by  disease  aggressiveness  after  correction 
for  multiple  testing  (Supplementary  Table  4). 

We  also  evaluated  whether  there  was  evidence  for  first- 
order  interactions  between  any  of  the  loci  identified  as 
having  a  statistically  significant  main  effect  on  risk  of 
prostate  cancer  (Table  1).  Using  an  additive  (per-allele) 
model  adjusted  for  age  and  study  center,  we  considered 
interactions  only  among  SNPs  not  in  LD.  The  most  sig¬ 
nificant  interaction  identified  was  between  2  SNPs  on 
chromosome  8q24:  rs10086908  (centromeric  to  Region  2) 
and  rs6983267  (Region  3;  nominal  P  value  =  0.021). 
However,  after  correction  for  multiple  testing  using  the 
false  discovery  rate  (FDR),  this  interaction  was  no  longer 
significant  (FDR  P  value  =  0.42).  No  other  P  values  for 
interaction  reached  statistical  significance. 

Finally,  we  evaluated  whether  there  was  evidence  for 
heterogeneity  in  associations  across  centers  by  generating 
forest  plots  of  the  individual  center  OR  estimates  that 
reached  overall  statistical  significance  (Supplementary 
Fig.  1).  With  very  few  exceptions,  the  associations  that 
reached  any  level  of  significance  showed  remarkable 
consistency  in  the  direction  of  the  risk  estimates.  There 
was  no  statistically  significant  heterogeneity  in  effects 
across  centers  (P  >  0.05  for  all  SNPs). 

Discussion 

A  number  of  recent  reports  have  modeled  the  role  of 
genomic  markers  on  prostate  cancer  susceptibility  (1-9). 
We  have  validated  a  number  of  these  loci,  including  8q24, 
JAZF1,  MSMB,  llql3,  and  NUDT10/U.  In  general,  the 
point  estimates  of  risk  at  these  loci  in  our  current  pooled 
analysis  of  19  studies  suggest  that  the  effects  of  these  loci 
in  ADM  are  similar  to  those  in  EDM.  We  also  observed  no 
statistically  significant  heterogeneity  of  effects  across 
studies  (Supplementary  Fig.  1).  A  number  of  loci  were 


not  validated  in  our  analysis,  despite  reaching  genome¬ 
wide  significance  in  GW  AS  of  EDM.  This  discrepancy 
may  be  explained  in  a  number  of  ways.  First,  the  present 
study  may  not  have  been  powered  to  identify  very  small 
effects  of  these  loci.  However,  for  a  number  of  loci,  we 
estimated  Ors  <1.0  with  95%  CIs  that  do  no  overlap  the 
OR  estimates  originally  reported  in  EDM.  The  effects  of 
most  remaining  nonsignificant  associations  were 
obtained  with  OR  <  1.05,  which  are  lower  than  those 
estimated  in  EDM.  If  the  effects  of  these  alleles  are  in  fact 
smaller  in  magnitude  in  ADM  than  those  reported  in 
EDM,  the  present  study  may  not  have  been  able  to  detect 
these  effects.  Second,  allele  frequencies  in  EDM  and  ADM 
differ  at  many  of  the  loci  studied  here  (Table  1),  as  do 
patterns  of  linkage  disequilibrium  by  ethnicity  (47).  These 
differences  also  may  affect  the  ability  to  detect  significant 
effects  at  some  loci  in  ADM,  where  they  may  have  been 
detectable  in  EDM.  However,  the  reverse  situation  is  also 
possible  (Table  1).  Finally,  if  none  of  these  limitations 
applies,  it  is  possible  that  the  loci  not  validated  in  the 
present  study  confer  susceptibility  only  in  EDM,  but  not 
ADM.  Although  it  is  unlikely  that  there  are  substantial 
biological  differences  in  prostate  cancer  etiology  between 
EDM  and  ADM,  interactions  of  environmental  expo¬ 
sures,  prostate  cancer  screening,  and  other  nongenetic 
risk  factors  may  influence  the  penetrance  of  these  alleles 
that  may  manifest  in  different  risk  profiles. 

One  of  the  more  consistent  associations  identified  to 
date  is  that  of  rsl0993994  at  MSMB  (lOqll;  refs.  2,  3), 
which  is  confirmed  as  a  prostate  cancer  susceptibility 
locus  in  ADM  in  this  study.  MSMB  is  a  microsemino- 
protein  beta  gene  that  encodes  PSP94,  a  nonglycosylated, 
cysteine-rich  protein  that  is  a  member  of  the  immunoglo¬ 
bulin-binding  factor  family  synthesized  by  epithelial  cells 
in  the  prostate  and  secreted  into  seminal  plasma  (3). 
Although  the  exact  function  of  PSP94  is  not  well  estab¬ 
lished,  it  is  postulated  to  be  involved  in  growth  regula¬ 
tion,  gene  expression,  and  apoptosis  in  prostate  cancer 
cells  (2).  PSP94  and  its  binding  protein  in  serum,  PSPBP, 
are  potential  serum  markers  for  both  prostate  cancer  risk 
and  aggressiveness  (48,  49),  unlike  the  current  prostate- 
specific  antigen  (PSA)  screening  which  mainly  detects  the 
presence  of  prostate  cancer  (48).  The  effect  of  rsl0993994 
in  MSMB  gene  expression  has  been  investigated  in  func¬ 
tion  studies  (5,  40).  The  prostate  cancer  risk-associated  T 
allele  of  the  rsl0993994  SNP  had  only  13%  of  the  pro¬ 
moter  activity  compared  with  the  C  allele,  and  treatment 
with  increasing  concentrations  of  the  synthetic  androgen 
R1881  resulted  in  a  dose-dependent  increase  in  promoter 
activity  of  the  C,  but  not  the  T  allele  of  the  this  SNP.  In 
addition,  tumor  cell  lines  with  a  CC  or  CT  genotype 
revealed  a  high  level  of  MSMB  gene  expression  compared 
with  cell  lines  with  a  TT  genotype.  These  findings  were 
specific  to  the  alleles  of  rsl0993994  and  not  from  other 
SNPs  in  the  proximal  promoter  of  MSMB.  The  significant 
association  found  in  rsl0993994  and  lack  of  association 
found  in  2  other  MSMB  SNPs  included  in  our  study  also 
suggests  the  potential  of  rsl0993994  as  the  causal  SNP. 
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Further  fine-mapping  studies  that  take  advantage  of  the 
shorter  LD  pattern  in  ADM  would  serve  to  augment  this 
hypothesis. 

JAZF1  ("juxtaposed  with  another  zinc  finger  protein  1") 
was  identified  by  the  Cancer  Genetic  Markers  of  Suscept¬ 
ibility  (CGEMS)  study  as  associated  with  prostate  cancer 
case-control  status  (3).  This  same  group  has  undertaken 
fine  mapping  at  this  locus  and  confirmed  that  the  original 
GWAS  association  with  rsl  0486567  (the  SNP  vahdated  in 
ADM  here)  is  likely  to  be  the  marker  responsible  for  the 
association  signal  at  this  locus  (50).  Because  rsl0486567 
lies  in  intron  2  of  JAZF1  and  is  not  known  to  alter  any 
apparent  splicing  or  expression  of  this  gene,  the  func¬ 
tional  significance  of  this  association  has  yet  to  be  deter¬ 
mined.  JAZF1  has  been  associated  with  somatic  fusion 
proteins  in  endometrial  tumors  (51-54),  but  no  other 
genomic  associations  have  been  reported. 

Two  previous  studies  (19,  21)  suggested  that  NUDT10/ 
11  was  associated  with  prostate  cancer  in  ADM.  One 
study  of  ADM,  not  included  in  the  present  data,  also 
reported  that  SNPs  at  llql3  were  associated  with  pros¬ 
tate  cancer  in  ADM  (21).  The  marginal  association 
between  these  2  loci  and  prostate  cancer  in  this  study 
is  suggestive  of  validation  with  GWAS  associations  in 
European  descent  populations,  but  additional  data  may 
be  required  to  fully  validate  these  associations  in  ADM. 

We  have  also  vahdated  the  previously  reported  asso¬ 
ciations  of  multiple  regions  of  chromosome  8q24  and 
prostate  cancer  in  ADM.  Originally  identified  by  admix¬ 
ture  mapping  methods  and  GWAS  (18),  this  locus  has 
been  shown  to  be  composed  of  a  number  of  indepen¬ 
dent  prostate  cancer  susceptibility  regions  (11,  42,  55, 
56).  Multiple  regions  have  been  validated  in  our  study, 
with  the  strongest  association  signals  seen  in  regions  2 
and  3,  and  our  findings  are  consistent  with  the  fine 
mapping  of  the  admixture  scan  (11).  The  association 
signals  seen  in  regions  1,  4,  and  a  region  centromeric  to 
region  2  are  much  weaker  compared  with  those  in 
regions  2  and  3. 

Finally,  a  number  of  other  loci  did  not  reach  statistical 
significance  in  any  analysis,  and  in  fact  provided  no 
evidence  for  association  with  prostate  cancer  in  ADM. 
These  included  many  loci  that  reached  genome-wide 
levels  of  significance  in  EA  but  had  P  value  >  0.2  (and 
many  with  P  >  0.9)  in  ADM  (Table  1).  These  include 
associations  that  were  reported  by  2  studies  of  ADM  that 
are  included  in  the  present  analysis,  but  did  not  reach 
statistical  significance  in  the  current  combined  data  set, 
including  KLK2/3  and  HNF1B/TCF2  (19,  20). 

It  is  possible  that  a  number  of  these  statistically  non¬ 
significant  associations  were  underpowered  in  the  pre¬ 
sent  sample,  especially  those  based  on  loci  with  lower 
minor  allele  frequencies.  However,  the  adjusted  OR  esti¬ 
mates  in  ADM  were  often  substantially  lower  than  those 
reported  in  EA  men  (Table  1).  Indeed,  some  risk  estimates 
in  ADM  that  had  been  estimated  to  be  OR  >  1  were 
estimated  in  ADM  to  be  OR  <  1,  suggesting  no  evidence 
for  a  comparable  association  in  between  the  2  groups. 


There  are  a  number  of  possible  explanations  for  these 
findings.  First,  the  loci  identified  in  GWAS  studies  of 
EDM  populations  could  represent  false-positive  asso¬ 
ciations  that  cannot  be  replicated  in  ADM.  Given  the 
large  sample  sizes  in  replication  studies  and  strong  P 
values  associated  with  these  loci  in  previous  reports, 
this  is  an  unlikely  scenario.  Second,  there  may  be  real 
heterogeneity  in  prostate  cancer  etiology  that  may  be 
reflected  by  differences  in  allele  frequency  (i.e.,  ability 
to  detect  associations)  or  differences  in  the  context  in 
which  these  alleles  are  acting  in  EDM  versus  ADM  due 
to  differences  in  environmental  exposures,  lifestyle,  or 
other  effect  modifiers  not  measured  in  studies  to  date. 
The  present  data  do  not  allow  us  to  address  whether 
prostate  cancer  in  ADM  is  less  strongly  influenced  by 
genes  relative  to  other  factors  than  in  EDM.  However, 
the  present  results  should  be  considered  in  future  stu¬ 
dies  that  may  attempt  to  address  this  hypothesis.  Third, 
the  causal  variants  may  not  have  been  identified  and 
genotyped  yet,  and  the  causal  variants  may  be  different 
in  EDM  and  ADM.  This  question  cannot  be  resolved  by 
the  data  presented  here  and  will  require  additional  fine- 
mapping  studies  as  well  as  ADM-specific  GWAS  stu¬ 
dies  in  which  existing  GWAS  loci  may  be  validated  and 
new  loci  may  be  identified. 

Despite  the  validation  of  some  prostate  cancer  loci  in 
ADM,  there  was  no  strong  evidence  that  these  loci  had 
different  effects  on  advanced  (e.g.,  high  stage  or  grade) 
disease  compared  with  less  advanced  disease  (e.g.,  low 
stage  or  grade).  This  may,  in  part,  be  due  to  the  limited 
power  to  detect  significant  differences  between  men  with 
more  versus  less  aggressive  disease  features.  In  some 
cases,  there  were  suggestions  that  some  SNPs  were  asso¬ 
ciated  with  more  aggressive  disease,  including  a  number 
of  SNPs  at  Chr.  8q24  (rs6981122,  rs7000448,  rsl6901896)  as 
well  as  others  such  as  rs7904463  (Chr.  10)  and  rs5945572 
(Chr.  X).  In  these  cases,  there  is  a  suggestion  of  stronger 
associations  in  more  versus  less  aggressive  disease  in  a 
case-control  study  design,  but  there  were  no  statistically 
significant  differences  observed  between  more  and  less 
aggressive  cases  in  a  case-case  comparison.  Similarly, 
there  were  a  number  of  loci  for  which  the  association  was 
stronger  for  less  advanced  disease  compared  with  more 
advanced  disease.  These  included  the  associations  for 
rs9623117  at  22ql3,  MSMB  and  JAZF1  SNPs,  for  which 
the  overall  significant  association  among  all  cases  com¬ 
bined  (Table  1)  appeared  to  exist  only  in  cases  with  less 
aggressive  features  (Supplementary  Table  4).  Our  results 
in  ADM  are  consistent  with  the  report  by  Kader  et  al.  (57) 
that  showed  the  majority  of  currently  identified  GWAS 
risk-associated  SNPs  could  not  differentiate  aggressive 
from  less  aggressive  diseases  in  EDM.  However,  contrary 
to  the  significant  finding  in  this  report  showing  that  SNPs 
in  KLK2/3  and  MSMB,  both  related  to  serum  PSA  levels, 
were  associated  with  less  aggressive  disease;  our  null 
finding  in  KLK2/3  and  MSMB  implies  that  PSA  screening 
may  not  introduce  the  same  degree  of  bias  in  cancer 
detection  in  ADM  as  seen  in  EDM. 
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In  studying  an  admixed  population  of  ADM  men,  there 
is  a  concern  for  potential  bias  due  to  confounding  by 
ethnicity  (i.e.,  population  stratification).  To  address  the 
potential  that  there  is  bias  in  the  risk  estimates,  we 
undertook  a  subset  analysis  of  those  centers  that  had 
genotyped  ancestry  markers  and  estimated  the  propor¬ 
tion  of  African  ancestry.  We  observed  no  substantial  bias 
in  the  estimates  of  association  for  any  SNP.  In  fact, 
compared  with  associations  adjusted  only  for  age  and 
center,  the  odds  ratios  for  7  of  47  (15%)  of  associations 
adjusted  for  age,  center,  and  percent  non-African  ances¬ 
try  changed  by  5%  or  more:  3  of  these  estimates  moved 
away  from  the  null  hypothesis  whereas  4  of  these  esti¬ 
mates  changed  toward  the  null.  These  empirical  data 
suggest  that  the  potential  for  bias  due  to  population 
stratification  is  not  large,  and  that  the  direction  of  this 
bias  may  not  always  be  away  from  the  null  hypothesis. 
None  of  these  SNPs  was  significantly  associated  with  the 
probability  of  having  prostate  cancer  before  or  after 
adjustment  for  ancestry,  so  the  consideration  of  ancestry 
did  not  change  any  inferences  based  on  our  results. 
Limitations  of  the  approach  used  here  include  the  use 
of  different  sets  of  markers  and  approaches  to  estimating 
African  ancestry  in  only  a  subset  of  the  available  studies. 
However,  our  data  provide  no  evidence  for  substantial 
bias  due  to  population  stratification  in  associations  of 
GW  AS  SNPs  in  prostate  cancer  etiology. 

In  conclusion,  we  have  validated  in  ADM,  the  associa¬ 
tions  of  some,  but  not  all,  prostate  cancer  susceptibility 
loci  originally  identified  in  non- African  descent  popula¬ 
tions.  The  finding  that  the  genetic  etiology  of  prostate 
cancer  may  be  different  in  ADM  and  EDM  suggests  that 
studies  that  take  advantage  of  the  shorter  LD  blocks  in 
ADM  or  more  complete  resequencing  efforts  will  facil¬ 
itate  identification  of  causal  variants  in  verified  risk  loci. 
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Genome-wide  association  study  of  prostate  cancer  in  men 
of  African  ancestry  identifies  a  susceptibility  locus  at  17q21 
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In  search  of  common  risk  alleles  for  prostate  cancer  that 
could  contribute  to  high  rates  of  the  disease  in  men  of  African 
ancestry,  we  conducted  a  genome-wide  association  study, 
with  1,047,986  SNP  markers  examined  in  3,425  African- 
Americans  with  prostate  cancer  (cases)  and  3,290  African- 
American  male  controls.  We  followed  up  the  most  significant 
17  new  associations  from  stage  1  in  1,844  cases  and  3,269 
controls  of  African  ancestry.  We  identified  a  new  risk  variant  on 
chromosome  17q21  (rs7210100,  odds  ratio  per  allele  =  1.51, 

P  -  3.4  x  10-13).  The  frequency  of  the  risk  allele  is  ~5%  in 
men  of  African  descent,  whereas  it  is  rare  in  other  populations 
(<1  %).  Further  studies  are  needed  to  investigate  the  biological 
contribution  of  this  allele  to  prostate  cancer  risk.  These  findings 
emphasize  the  importance  of  conducting  genome-wide 
association  studies  in  diverse  populations. 

Genome-wide  association  studies  (GWAS)  of  prostate  cancer  have 
identified  more  than  30  variants  associated  with  risk  that,  in  aggregate, 
are  estimated  to  account  for  approximately  20%  of  the  familial  risk  of 
prostate  cancer1-12.  Aside  from  admixture  and  fine-mapping  studies 
that  identified  multiple  independent  risk  variants  at  8q24  (refs.  13,14), 
and  a  more  recent  GWAS  among  Japanese  men  that  identified  five  new 
loci9,  discoveries  in  prostate  cancer  have  come  from  studies  in  men 
of  European  ancestry.  However,  prostate  cancer  incidence  in  men  of 
African  ancestry  is  greater  than  in  non- African  populations15,  with 
the  disparity  presumably  reflecting  both  differences  in  prevalence  of 
environmental  risk  factors  and  susceptibility  alleles  that  are  shared 
among  men  of  African  descent.  For  example,  the  risk  variants  at  8q24, 
many  of  which  are  more  common  in  men  of  African  ancestry14,  could 


contribute  partly  to  the  greater  incidence  of  prostate  cancer  in  this 
population  and  provide  some  support  for  the  hypothesis  of  a  genetic 
contribution  underlying  racial  and  ethnic  disparities  in  disease  risk. 

We  assembled  a  consortium  of  prostate  cancer  studies  that 
included  men  of  African  ancestry  and  conducted  a  GWAS  to  search 
for  additional  risk  loci  that  may  be  more  common  in  men  of  African 
descent.  Stage  1  included  3,62 1  African-American  cases  with  prostate 
cancer  and  3,502  African-American  controls  drawn  from  1 1  studies 
(Supplementary  Table  1  and  Online  Methods).  We  conducted 
genotyping  in  stage  1  using  the  Illumina  Infinium  lM-Duo.  Following 
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Figure  1  A  plot  of  the  — log10  P values  by  chromosome. 
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Table  1  The  association  of  variant  rs7210100  at  17q21  with  prostate  cancer  risk  in  men 
of  African  ancestry 


Stage  1  studies 

Cases/controls3 

RAF  in  controls 

0Rb 

95%  Clb 

pc 

MEC 

1,060/1,055 

0.04 

1.58 

1.21-2.08 

8.8  x  10“4 

sees 

201/412 

0.05 

1.40 

0.85-2.31 

0.19 

PLCO 

227/239 

0.05 

1.44 

0.82-2.52 

0.21 

CPS-II 

64/112 

0.07 

0.66 

0.24-1.78 

0.41 

MDA 

527/437 

0.05 

1.39 

0.95-2.02 

0.089 

IPCG 

354/157 

0.05 

1.54 

0.84-2.82 

0.17 

LAAPC 

288/287 

0.06 

0.94 

0.57-1.56 

0.81 

CaP  Genes 

71/85 

0.06 

1.72 

0.78-3.82 

0.18 

DCPD 

263/341 

0.07 

1.14 

0.75-1.75 

0.54 

KCPCS 

141/75 

0.05 

0.95 

0.42-2.16 

0.90 

GECAP 

224/89 

0.05 

2.47 

1.14-5.34 

0.022 

Combined 

3,420/3,289 

1.40 

1.21-1.62 

5.2  x  10“6 

PHet  =  0.89d 

Stage  2  studies 

SFPCS 

86/36 

0.04 

1.86 

0.53-6.55 

0.34 

FMHS 

125/339 

0.06 

1.70 

0.98-2.93 

0.058 

MEC-LAC 

551/555 

0.04 

1.92 

1.30-2.83 

9.7  x  10“4 

NCPCS 

214/249 

0.06 

0.92 

0.51-1.66 

0.79 

WFPCS 

58/65 

0.04 

1.90 

0.56-6.42 

0.30 

WUPCS 

73/153 

0.04 

1.96 

0.76-5.03 

0.16 

GHS 

264/964 

0.07 

1.37 

0.94-2.01 

0.11 

Combined 

1,371/2,361 

1.55 

1.26-1.89 

2.5  x  10“5 

PHet  =  0.25d 

Stage  3  studies 

SCORE 

146/267 

0.05 

1.58 

0.88-2.83 

0.13 

PROGRES 

79/395 

0.05 

2.64 

1.36-5.10 

4.0  x  10“3 

PCBP 

246/242 

0.05 

2.02 

1.20-3.39 

7.9  x  10“3 

Combined 

471/904 

2.07 

1.49-2.88 

1.5  x  10“5 

PHet  =  0.51d 

Stages  1+2+3 

5,262/6,554 

1.51 

1.35-1.69 

3.4  x  lO"13 

PHet  =  0.58d 

aNumber  of  cases  and  controls  with  genotype  data  for  rs7210100.  bAdjusted  for  age  and  eigenvectors  1-10  in  stage  1  {and 
study  in  pooled  analysis).  Adjusted  for  age  in  stage  2  and  stage  3.  Adjusted  for  age  and  study  in  stage  1+2+3  analysis.  CP  for 
trend  (1  degree  of  freedom).  dTest  of  heterogeneity.  RAF,  risk  allele  frequency;  OR,  odds  ratio;  95%  Cl,  95%  confidence  interval. 


quality-control  exclusions  (Online  Methods),  the  stage  1  analysis  con- 
(jjjw  sisted  of  1,047,986  SNPs  (minor  allele  frequency  >0.01)  examined  in 
Ep  3,425  cases  and  3,290  controls. 

In  comparing  (for  all  SNPs)  the  observed  with  the  expected  dis¬ 
tribution  of  P  values  from  a  1  -degree-of-freedom  trend  test,  there 
was  evidence  of  inflation  in  the  test  statistic  (A  =  1.11).  Principal 
components  analysis  highlighted  the  high  degree  of  admixture  in  this 
population,  and  the  overinflation  diminished  following  additional 
adjustment  for  ancestry  ( A  =  1 .03;  Supplementary  Fig.  1  and  Online 
Methods).  The  association  of  four  SNPs  achieved  genome-wide  sig¬ 
nificance  in  the  stage  1  sample,  with  P  values  between  P  =  5.4  x  10-9 
and  P  =  5.7  x  10-13  (Fig.  1).  These  SNPs  are  located  in  known  prostate 
cancer  risk  regions,  three  of  which  are  at  8q24  (rsl0505483,  rsl456315 
and  rs7824364  at  128.173-128.205  Mb  (NCBI36)  and  one  of  which  is 
at  1 1  ql 3  (rs7130881  at  67.75  Mb). 

We  selected  17  SNPs  (P  <  2  x  10-5)  located  outside  of  known  pros¬ 
tate  cancer  risk  regions  to  examine  in  a  second  stage.  The  associa¬ 
tions  of  these  17  SNPs  with  prostate  cancer  risk  were  not  influenced 
substantially  by  population  stratification  in  the  stage  1  sample  as 
evaluated  by  principal  components  analysis  (Supplementary 
Table  2).  The  stage  2  sample  included  1,396  cases  and  2,383  con¬ 
trols  of  African  ancestry  from  seven  independent  studies:  six  US- 
based  studies  and  one  study  in  Ghana.  Of  the  17  SNPs,  only  marker 


rs7210100  at  17q21  was  significantly  associ¬ 
ated  with  risk  in  the  stage  2  studies  (odds 
ratio  (OR)  =  1.55,  P  =  2.5  x  KT5;  Table  1). 
None  of  the  other  SNPs  selected  in  stage  1 
were  significantly  associated  with  risk  in 
the  stage  2  sample  (all  P  values  were  >0.05); 
we  excluded  rsl31 16912  because  it  deviated 
from  Hardy- Weinberg  equilibrium  in  the 
majority  of  stage  2  studies.  The  results  for  all 
17  SNPs  in  stage  1  and  stage  2  are  presented 
in  Supplementary  Table  3. 

We  further  examined  the  association  with 
rs7210100  in  a  third  stage  that  included  three 
studies  among  men  of  African  descent,  a  study 
from  the  United  States  (SCORE),  a  study  in 
Senegal  (PROGRES)  and  a  study  in  Barbados 
(PCBP).  We  found  rs7210100  to  be  positively 
associated  with  risk  in  all  three  studies  (stage  3, 
471  cases  and  904  controls;  combined  OR  = 
2.07,  P=  1.5  x  10“5;  Table  1). 

Adjustment  for  global  ancestry  or  local 
ancestry  (African  versus  European)  in  the 
stage  1  studies  did  not  influence  the  results 
for  rs7210100  (OR=  1.41  without  adjustment 
for  ancestry,  OR  =  1.40  adjusted  for  global 
ancestry  and  OR  =  1.43  adjusted  for  global 
and  local  ancestry).  The  effect  estimate  for 
rs72 10100  was  also  similar  in  men  with  <15% 
global  European  ancestry  (1,251  cases  and 
1,325  controls;  OR  =  1.41)  as  well  as  in  cases 
and  controls  estimated  to  have  two  chro¬ 
mosomes  of  African  ancestry  at  this  loca¬ 
tion  (2,214  cases  and  2,080  controls;  OR  = 
1.47).  We  observed  no  evidence  of  hetero¬ 
geneity  of  the  association  by  study  for  this 
variant  in  the  stage  1  (Phet  =  0.89),  stage  2 
(Phet  =  0.25)  or  stage  3  studies  (Phet  =  0.51)  or 
among  all  studies  (Phet  =  0.58).  Results  for  all 
SNPs  examined  in  the  replication  stages  were  also  unaffected  when 
adjusting  for  European  ancestry  in  studies  in  which  information  on 
global  ancestry  was  available  (Supplementary  Tables  4  and  5). 

In  combining  the  results  across  all  three  stages  (5,262  cases  and 
6,554  controls),  rs7210100  was  strongly  and  significantly  associated 
with  risk  (OR  =  1.51,  95%  Cl  1.35-1.69,  P  =  3.4  x  10“13).  The  risk 
for  heterozygote  and  homozygote  carriers  was  1.49  (95%  Cl  1.32— 
1.68)  and  2.73  (95%  Cl  1.50-4.96),  respectively.  We  did  not  find  any 
stronger  signal  with  imputed  SNPs  to  the  phase  2  HapMap  popula¬ 
tions  in  the  surrounding  region  at  chromosome  17q21  (Fig.  2  and 
Supplementary  Fig.  2). 

The  association  with  rs7210100  was  similar  when  stratifying  on 
age  ( P  =  0.72)  and  first-degree  family  history  of  prostate  cancer 
(P  =  0.36).  We  also  observed  no  significant  difference  in  the  asso¬ 
ciation  of  rs7210100  with  prostate  cancer  stage  ( P  =  0.94)  or  tumor 
grade  (P  =  0.11)  at  diagnosis.  However,  the  association  with  rs7210100 
was  greater  for  non-advanced  disease  when  classified  based  on  stage 
and  grade  (Gleason  score  <8  and  localized  stage,  2,433  cases  and 
6,554  controls,  OR  =  1.67,  P  =  8.6  x  10-12)  than  for  advanced  disease 
(Gleason  score  >8  or  non-localized  disease,  1,719  cases  and  6,554 
controls,  OR  =  1.27,  P  =  5.0  x  10-3,  Phet  =  6.0  x  10-3). 

Among  controls  with  prostate-specific  antigen  (PSA)  levels  meas¬ 
ured  at  <4  ng/ml  («  =  2,383),  we  found  no  significant  association 
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Figure  2  A  regional  plot  of  the  — log10  P  values  for  genotyped  (squares) 
and  imputed  (circles)  SNPs  at  the  chromosome  17q21  risk  locus  in 
the  stage  1  African-American  sample.  The  shading  depicts  the  strength 
of  the  correlation  (P)  between  rs7210100  and  the  SNPs  tested  in  the 
region.  The  correlation  is  estimated  in  the  YRI  population  from  the  1000 
Genomes  Project  (June  2010).  Also  shown  are  human  genome  build  18 
coordinates  (Mb),  recombination  rates  in  cM  per  Mb  and  genes  in  the 
region.  The  plot  was  generated  using  LocusZoom. 


incidence  in  non-Hispanic  individuals  of  European  descent.  Because 
approximately  10%  of  African-American  men  carry  this  variant, 
which  increases  their  risk  1.50-fold  over  non-carriers,  we  estimate 
that  this  locus  may  be  responsible  for  as  much  as  9%  (95%  Cl  6-12%) 
of  the  greater  incidence  of  prostate  cancer  in  African-American  men 
(Online  Methods). 

In  summary,  we  detected  a  marker  of  risk  for  prostate  cancer  that 
appears  specific  to  men  of  African  descent,  who  have  an  increased 
incidence  and  mortality  of  this  disease.  These  findings  provide  strong 
support  for  conducting  GWAS  in  diverse  populations  to  identify 
markers  of  risk  that  may  be  population  specific  and  which  could  con¬ 
tribute  to  racial  and  ethnic  disparities  in  disease  incidence.  Further 
work  is  needed  to  characterize  the  17q21  region  and  conduct  the 
functional  studies  required  to  understand  the  role  of  this  germ-line 
variation  in  prostate  cancer  susceptibility. 

URLs.  SEER,  http://seer.cancer.gov/;  LocusZoom,  http://csg.sph. 
umich.edu/locuszoom/;  PLINK,  http://pngu.mgh.harvard.edu/ 
-purcell/plink/;  EIGENSTRAT,  http://genepath.med.harvard.edu/ 
-reich/Software.htm. 

METHODS 

Methods  and  any  associated  references  are  available  in  the  online 
version  of  the  paper  at  http://www.nature.com/naturegenetics/. 

Note:  Supplementary  information  is  available  on  the  Nature  Genetics  website. 


between  PSA  levels  and  rs7210100  genotype  ( P  =  0.58).  Limiting 
the  analysis  to  controls  with  PSA  levels  <4  ng/ml  and  cases  from 
these  studies  did  not  change  the  association  between  rs7210100  and 
prostate  cancer  risk  {n  =  3,157  cases  and  2,383  controls,  OR  =  1.62, 
P  =  4.5  x  10“s). 

The  variant  rs7210100  is  located  in  intron  1  of  ZNF652  on  chro¬ 
mosome  17q21.32.  ZNF652  encodes  a  zinc-finger  protein  transcrip¬ 
tion  factor  that  has  been  shown  to  interact  with  the  eight-twenty-one 
(ETO)  protein,  CBFA2T3,  which  acts  as  a  transcriptional  repressor 
by  forming  complexes  with  co-repressor  proteins  and  HDACs16.  Co¬ 
expression  of  ZNF652  and  the  androgen  receptor  in  prostate  tumors 
has  been  associated  with  a  decrease  in  relapse-free  survival17.  A  com- 
1^3  mon  variant  just  upstream  of  ZNF652  has  also  been  associated  with 
blood  pressure  in  a  GWAS  of  men  and  women  of  European  ancestry18. 
Sequencing  of  the  five  coding  exons  of  ZNF652  in  48  subjects  (with  an 
oversampling  of  risk  allele  carriers;  Online  Methods)  did  not  reveal 
a  coding  variant  strongly  correlated  with  rs7210100.  Further  work 
is  needed  to  map  this  locus  in  order  to  nominate  optimal  candidate 
markers,  in  addition  to  rs7210100,  for  functional  studies  in  pursuit  of 
regulatory  effects  of  one  or  more  variants  in  the  region. 

The  risk  allele  of  rs7210100  is  relatively  uncommon  in  men  of 
African  ancestry  (4-7%),  and  is  extremely  rare  (<1%)  in  non-African 
populations  as  reported  by  the  1000  Genomes  Project.  The  frequency 
of  the  risk  allele  in  men  of  west- African  ancestry  (Ghana  and  Senegal) 
is  very  similar  to  that  observed  in  African  Americans,  as  well  as  in 
men  from  east  Africa  (Uganda;  n  =  111,  risk  allele  frequency  =  0.04). 
GWAS  in  populations  of  European  ancestry  have  not  pointed  to  this 
region  of  17q21  as  a  risk  locus  for  prostate  cancer  (Supplementary 
Fig.  3).  Together,  these  observations  suggest  that  the  underlying 
biologically  relevant  allele  may  be  limited  to  populations  of  African 
descent.  As  reported  by  the  National  Cancer  Institute’s  Surveillance, 
Epidemiology  and  End  Results  (SEER)  Program,  prostate  cancer 
incidence  in  African-American  men  is  1.56  times  higher  than  the 
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ONLINE  METHODS 

Studies.  The  studies  included  in  stage  1  were  drawn  from  1 1  epidemiologi¬ 
cal  studies  of  prostate  cancer  among  African-American  men.  These  studies 
included  The  Multiethnic  Cohort  (MEC;  1,094  cases  and  1,096  controls),  The 
Southern  Community  Cohort  Study  (SCCS,  212  cases  and  419  controls),  The 
Prostate,  Lung,  Colorectal  and  Ovarian  Cancer  Screening  Trial  (PLCO,  286 
cases  and  269  controls),  The  Cancer  Prevention  Study  II  Nutrition  Cohort 
(CPS-II,  76  cases  and  152  controls),  Prostate  Cancer  Case-Control  Studies  at 
MD  Anderson  (MDA,  543  cases  and  474  controls),  Identifying  Prostate  Cancer 
Genes  (IPCG,  368  cases  and  172  controls),  The  Los  Angeles  Study  of  Aggressive 
Prostate  Cancer  (LAAPC,  296  cases  and  303  controls),  Prostate  Cancer 
Genetics  Study  (CaP  Genes,  75  cases  and  85  controls),  Case-Control  Study 
of  Prostate  Cancer  among  African  Americans  in  Washington,  DC  (DCPC, 
292  cases  and  359  controls),  King  County  (Washington)  Prostate  Cancer  Study 
(KCPCS,  145  cases  and  81  controls)  and  The  Gene-Environment  Interaction 
in  Prostate  Cancer  Study  (GECAP,  234  cases  and  92  controls).  These  studies 
provided  DNA  samples  for  3,621  cases  and  3,502  controls. 

Stage  2  included  1,396  cases  and  2,383  controls  from  seven  studies:  San 
Francisco  Bay  Area  Prostate  Cancer  Study  (SFPCS,  86  cases  and  37  con¬ 
trols),  The  Flint  Men’s  Health  Study  (FMHS,  135/353),  The  Multiethnic 
Cohort/Los  Angeles  County  (MEC-LA,  554  cases  and  557  controls),  North 
Carolina  Prostate  Cancer  Study  (NCPCS,  214  cases  and  249  controls),  Wake 
Forest  University  Prostate  Cancer  Study  (WFPCS,  59  cases  and  66  con¬ 
trols),  Washington  University  Prostate  Cancer  Study  (WUPCS,  75  cases  and 
153  controls)  and  The  Ghana  Men’s  Health  Study  (GHS,  271  cases  and  968 
controls).  Stage  3  included  484  cases  and  947  controls  from  three  studies: 
The  Study  of  Clinical  Outcomes,  Risk  and  Ethnicity  (SCORE,  152  cases  and 
280  controls),  Prostate-Genetique-Recherche-Senegal  (PROGRES,  86  cases 
and  414  controls)  and  Prostate  Cancer  in  a  Black  Population  (PCBP,  246  cases 
and  253  controls).  Detailed  information  about  the  design  and  organization  of 
each  study  is  provided  in  the  Supplementary  Note. 

Genotyping  and  quality  control.  Genotyping  in  stage  1  (3,621  cases  and  3,502 
controls)  was  conducted  using  the  Illumina  Infinium  Human lM-Duo.  Samples 
(n  =  408)  were  removed  based  on  the  following  exclusion  criteria:  (i)  unknown 
replicates  across  studies,  (ii)  call  rates  <95%,  (iii)  >10%  mean  heterozygosity 
on  the  X  chromosome  and/or  <10%  mean  intensity  on  the  Y  chromosome, 
(iv)  ancestry  outliers  and  (v)  samples  that  were  related  (discussed  below).  The 
concordance  rate  for  158  replicate  samples  was  99.99%.  Starting  with  1,153,397 
SNPs,  we  removed  SNPs  with  <95%  call  rate,  minor  allele  frequencies  <1%  or 
>1  quality- control  mismatch  based  on  sample  replicates  ( n  =  105,411).  The 
analysis  included  1,047,986  SNPs  among  3,425  cases  and  3,290  controls. 

We  used  PLINK  (see  URLs)  to  calculate  the  probabilities  of  sharing  0, 1  and 
pT>k  2  alleles  (Z  =  Z0,  Zl,  Z2)  across  all  possible  pairs  of  samples  to  determine  indi- 

i~3  viduals  who  were  likely  to  be  related  to  others  within  and  across  studies.  We 

~  identified  167  pairs  of  related  subjects  (monozygotic  twin,  parent-offspring, 
full-  and  half-sibling  pairs)  based  on  the  values  of  their  observed  probability 
vector  Z  being  within  1  standard  deviation  of  the  expected  values  of  Z  for  their 
respective  relationship.  The  criterion  for  removal  was  such  that  individuals 
that  were  connected  with  a  higher  number  of  pairs  were  chosen  for  removal. 
In  all  other  cases,  one  of  the  two  members  was  randomly  selected  for  removal. 
A  total  of  141  subjects  were  removed. 

The  EIGENSTRAT  (see  URLs)  software  was  used  to  calculate  eigenvec¬ 
tors  that  explained  genetic  differences  in  ancestry  among  samples  in  the 
study19.  We  included  data  from  both  HapMap  populations  (CEPH  (Utah  resi¬ 
dents  with  ancestry  from  northern  and  western  Europe)  (CEU),  Japanese  in 
Tokyo,  Japan  (JPT),  Yoruba  in  Ibadan,  Nigeria  (YRI)  and  African  ancestry  in 
the  Southwestern  United  States  (ASW))  and  our  study  so  that  comparisons 
to  reference  populations  of  known  ethnicity  could  be  made.  A  total  of  2,546 
ancestry- informative  SNPs  from  the  Illumina  array  were  selected  based  on  low 
inter-marker  correlation  and  ability  to  differentiate  between  samples  of  African 
and  European  descent.  An  individual  was  subject  to  filtering  from  the  analysis 
if  his  value  along  eigenvector  1  or  2  was  outside  of  4  standard  deviations  from 
the  mean  of  each  respective  eigenvector.  We  identified  108  individuals  who  met 
this  criterion.  Eigenvector  1  was  highly  correlated  (p  =  0.997,  P<  lx  10-16)  with 
percentage  of  European  ancestry,  estimated  in  HAPMIX20.  Together,  the  top  ten 
eigenvectors  explain  21%  of  the  global  genetic  variability  among  subjects. 


Genotyping  in  the  stage  2  and  3  studies  was  conducted  using  the  TaqMan 
allelic  discrimination  assay.  In  stage  2,  we  removed  samples  missing  data  for 
greater  than  three  SNPs  {n  =  36).  To  assess  genotyping  reproducibility,  each  study 
included  replicate  samples;  the  concordance  was  >98%  for  each  SNP  within  each 
study.  rsl3 116912  deviated  from  Hardy- Weinberg  equilibrium  in  all  but  one  of 
the  stage  2  studies  and  was  removed  from  the  stage  2  analysis.  No  other  SNP 
deviated  from  Hardy- Weinberg  equilibrium  ( P  <  0.01  in  more  than  two  studies) 
in  stage  1  or  2.  The  call  rate  for  rs7210100  was  very  high  in  stage  1  (99.9%)  and 
was  similar  in  cases  (99.9%)  and  controls  (99.9%).  The  call  rate  for  this  SNP  was 
also  very  high  in  stages  2  (99.8%  overall,  99.9%  in  cases  and  99.8%  in  controls) 
and  3  (96.1%  overall,  97.3%  in  cases  and  95.5%  in  controls). 

Sequencing.  Bi-directional  sequencing  of  rs7210100  and  the  five  coding  exons 
of  ZNF652  was  performed  in  48  subjects  (20  homozygous  for  the  risk  vari¬ 
ant,  20  heterozygous  for  the  risk  variant  and  8  homozygous  for  the  wild-type 
allele.)  Primers  were  designed  at  least  50  bases  upstream  and  downstream 
from  each  exon. 

Statistical  analysis.  In  stage  1,  we  tested  the  association  of  each  SNP  and 
prostate  cancer  risk  using  a  1-degree-of-freedom  x2  likelihood  ratio  test  from 
a  logistic  regression  analysis  adjusted  for  age,  study  and  the  first  ten  eigenvec¬ 
tors  estimated  by  principal  components  analysis19.  Overinflation  of  the  test 
statistic  was  examined  with  and  without  adjustment  for  ancestry  and  was 
visualized  with  quantile -quantile  plots.  Lambdas  were  estimated  as  the  median 
of  the  test  statistics  divided  by  0.456  (the  median  of  the  1-degree-of-freedom 
X2  null  distribution).  Age-adjusted  ORs  and  95%  CIs  for  each  SNP  were  esti¬ 
mated  from  the  same  logistic  regression  model.  At  each  locus  and  for  each 
participant,  local  ancestry  was  defined  as  the  estimated  number  of  European 
chromosomes  (continuous  between  0  and  2)  carried  by  the  participant  esti¬ 
mated  using  the  HAPMIX  program20.  Local  ancestry  at  the  17q21  locus  was 
evaluated  as  a  confounder  in  the  analysis  of  rs7210100. 

Phased  haplotype  data  from  the  founders  of  the  CEU  and  YRI  HapMap 
phase  2  samples  were  used  to  infer  linkage  disequilibrium  patterns  in  order 
to  impute  untyped  markers.  We  carried  out  genome-wide  imputation  using 
the  software  MACH21.  The  Rsq  metric  was  used  as  a  threshold  in  determining 
which  SNPs  to  filter  from  analysis  (Rsq  <  0.3).  Imputed  SNPs  in  the  17q21 
risk  region,  as  shown  in  Figure  2,  were  examined  in  association  with  prostate 
cancer  risk  as  described  for  typed  SNPs  above. 

In  stage  2,  the  SNPs  were  analyzed  using  logistic  regression  controlling  for  age 
and  study  (in  the  pooled  analysis).  Information  regarding  European  ancestry 
was  available  for  seven  studies  included  in  stages  2  and  3.  As  observed  in  stage  1 
(Supplementary  Table  2),  the  OR  for  rs7210100  was  similar  with  and  without 
adjustment  for  estimated  European  ancestry  in  these  studies  (Supplementary 
Table  4).  The  results  for  rs7210100  in  stage  2,  stage  3  and  stages  1,  2  and  3 
combined  are  presented  without  adjustment  for  ancestry.  Heterogeneity  of  the 
OR  across  studies  was  evaluated  using  a  likelihood  ratio  test. 

Effect  modification  by  age  and  first- degree  family  history  of  prostate  cancer 
was  assessed  in  stratified  analyses,  and  significance  was  determined  compar¬ 
ing  the  model  with  and  without  the  cross-product  term  using  a  likelihood 
ratio  test.  We  also  examined  the  association  of  rs7210100  genotype  with  stage, 
Gleason  score  and  the  combination  of  stage  and  grade,  with  advanced  disease 
defined  as  Gleason  score  >8  or  stage  >2  (non-localized  disease),  and  non- 
advanced  disease  was  defined  as  Gleason  score  <8  and  stage  =  1  (localized 
disease).  A  case- only  analysis  was  used  to  test  for  differences  in  the  associa¬ 
tion  ofrs7210100  with  disease  phenotypes.  The  association  ofrs7210100  with 
least-squares  geometric-mean  PSA  levels  was  examined  using  multiple  linear 
regression  adjusting  for  age,  body  mass  index  and  study. 

We  estimated  the  risk  ratio  between  populations  of  different  ancestral 
origin  (African  or  European)  caused  by  rs7210100  as  RR  =  [(1  -  pA)2  + 

2Pa(!  -  Pa)rri  +  Pa2rr2]/(1  -  Pe)2  +  2Pe(1  -  Pe)rri  +  Pe2rr21-  Here 
pA  is  the  risk  allele  frequency  in  African  origin  populations,  pE  is  the  risk 
allele  frequency  in  European  populations,  RRX  is  the  relative  risk  associated 
with  carrying  one  copy  of  the  risk  allele  (compared  to  none)  and  RR2  is 
the  relative  risk  associated  with  carrying  two  copies  of  the  risk  allele.  We 
used  values  pA  =  0.05,  pE  =  0,  RRj  =  1.5  and  RR2  =  1.52  so  that  the  risk  ratio 
between  populations  caused  by  the  influence  of  this  risk  allele  was  estimated 
to  be  equal  to  1.050625.  Using  the  SEER  incidence  rates  of  prostate  cancer 
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in  African  Americans  (234.6  per  100,000)  and  non-Hispanic  individuals 
of  European  ancestry  (150.4  cases  per  100,000),  we  estimated  the  ratio  of 
risks  between  these  populations  as  234.6/150.4  =  1.56.  The  percentage  of 
greater  risk  to  African  Americans  that  may  be  associated  with  rs7210100  was 
estimated  as  1  -  [(1.56  -  1.050625)/(1.56  -  1)]  x  100. 
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Recombination,  together  with  mutation,  gives  rise  to  genetic  variation  in  populations.  Here  we  leverage  the  recent 
mixture  of  people  of  African  and  European  ancestry  in  the  Americas  to  build  a  genetic  map  measuring  the  probability  of 
crossing  over  at  each  position  in  the  genome,  based  on  about  2.1  million  crossovers  in  30,000  unrelated  African 
Americans.  At  intervals  of  more  than  three  megabases  it  is  nearly  identical  to  a  map  built  in  Europeans.  At  finer  scales 
it  differs  significantly,  and  we  identify  about  2,500  recombination  hotspots  that  are  active  in  people  of  West  African 
ancestry  but  nearly  inactive  in  Europeans.  The  probability  of  a  crossover  at  these  hotspots  is  almost  fully  controlled  by  the 
alleles  an  individual  carries  at  PRDM9  (P  value  <  10_24S) .  We  identify  a  17-base-pair  DNA  sequence  motif  that  is  enriched 
in  these  hotspots,  and  is  an  excellent  match  to  the  predicted  binding  target  of  PRDM9  alleles  common  in  West  Africans 
and  rare  in  Europeans.  Sites  of  this  motif  are  predicted  to  be  risk  loci  for  disease -causing  genomic  rearrangements  in 
individuals  carrying  these  alleles.  More  generally,  this  map  provides  a  resource  for  research  in  human  genetic  variation 
and  evolution. 


In  humans  and  many  other  species,  recombination  is  not  evenly 
distributed  across  the  genome,  but  instead  occurs  in  ‘hotspots’: 
2-kilobase  (kb)  segments  where  the  crossover  rate  is  far  higher  than 
in  the  flanking  DNA  sequence1'3.  The  highest- resolution  genetic  map 
in  contemporary  humans  so  far — the  deCODE  map — is  based  on 
about  500,000  crossovers  identified  in  15,000  Icelandic  meioses4. 
However,  a  limitation  of  maps  built  in  people  of  European  descent4'6 
is  that  they  may  not  apply  equally  well  in  other  populations,  as  sug¬ 
gested  by  comparisons  of  maps  across  ethnic  groups4,7'9  and  patterns 
of  linkage  disequilibrium  breakdown,  which  indicate  that  more  of  the 
genome  may  be  recombinationally  active  in  West  Africans10.  It  is 
known  that  a  major  determinant  of  the  positions  of  recombination 
hotspots  is  PRDM9,  a  meiosis-specific  histone  H3  methyltransferase 
whose  zinc  finger  (ZF)  domain  binds  DNA  sequence  motifs11'13.  In 
Europeans,  PRDM9  ZF  arrays  are  predominantly  of  two  similar  types, 
A  and  B,  both  of  which  bind  the  13 -bp  motif  CCNCCNTNNCCNC11. 
In  contrast,  36%  of  West  African  alleles  are  not  of  the  A  or  B  type9,13. 
Sperm  typing  of  males  who  carry  neither  the  A  nor  the  B  allele  has 
shown  no  evidence  of  crossover  activity  at  recombination  hotspots 
associated  with  the  13-bp  motif9. 

Building  an  African-American  genetic  map 

To  investigate  differences  in  the  crossover  landscape  across  human 
populations,  we  built  a  genetic  map  in  African  Americans,  who  have 
an  average  of  about  80%  West  African  and  20%  European  ancestry, 
leading  to  genomes  comprised  of  multi-megabase  stretches  of  either 
West  African  or  European  ancestry14.  Computational  approaches, 
including  HAPMIX15,  have  been  developed  to  infer  the  probability  of 
0,  1  or  2  European  or  African  alleles  at  each  locus  in  individuals  geno- 
typed  at  hundreds  of  thousands  of  single  nucleotide  polymorphisms 
(SNPs)15"17.  Positions  where  the  inferred  number  of  European  or 
African  alleles  changes  reflect  crossover  events  that  have  occurred  since 
admixture  began  (on  average  six  generations  ago15).  Change  in  the 
probability  of  European  ancestry  between  adjacent  SNPs  can  be  inter¬ 
preted  as  the  probability  of  such  a  crossover  between  them.  W e  inferred 
crossover  events  in  29,589  apparently  unrelated  African  Americans 
who  had  been  genotyped  on  SNP  arrays  in  genetic  association  studies 
(Methods;  Fig.  la).  To  minimize  false-positive  crossovers,  we  restricted 


to  crossovers  that  HAPMIX  inferred  with  a  probability  of  >95%,  and 
that  were  flanked  by  a  minimum  of  2-centimorgan  (cM)  stretches 
where  the  ancestry  was  inferred  to  be  unchanging  (Supplementary 
Note  1).  This  produced  2,113,293  high- confidence  crossovers,  with  a 
typical  switch  point  resolved  within  70  kb  with  probability  50% 
(Supplementary  Note  1). 

To  build  a  high-resolution  African-American  genetic  map  ( AA  map), 
we  leveraged  the  fact  that  most  crossovers  occur  in  hotspots  shared 
across  individuals2  (Methods).  Intuitively,  although  any  crossover  can 
only  be  roughly  localized,  inter- SNP  intervals  that  are  inferred  to  have 
an  appreciable  probability  of  crossover  in  multiple  individuals  are  likely 
to  contain  recombination  hotspots,  allowing  much  better  localization 
(Supplementary  Fig.  1).  To  implement  this  idea,  we  modelled  the 
recombination  rate  for  each  inter-SNP  interval  as  shared  across  indivi¬ 
duals  and  used  Markov  chain  Monte  Carlo  (MCMC)  to  sample  rates 
consistent  with  the  data  (Methods).  This  provides  well-calibrated 
estimates  of  the  crossing-over  rate  between  all  pairs  of  markers  as  well 
as  estimates  of  rate  uncertainty  (Supplementary  Note  1  and  Sup¬ 
plementary  Fig.  2).  We  find  that  the  interval  size  at  which  the  average 
recombination  rate  is  equal  to  the  standard  error  is  6  kb,  which  is  the 
same  accuracy  that  would  be  expected  from  a  map  based  on  500,000 
crossovers  whose  boundaries  were  precisely  resolved  (Supplementary 
Note  1).  Despite  this  high  resolution,  there  are  also  some  limitations. 
First,  the  AA  map  does  not  separately  infer  male  and  female  recom¬ 
bination  rates  (it  is  a  sex-averaged  map)  and  requires  normalization  by 
the  total  map  length  (like  linkage  disequilibrium  maps3,18).  Second,  the 
map  has  less  resolution  and  may  miss  a  higher  fraction  of  true  cross¬ 
overs  at  loci  where  it  is  more  difficult  to  detect  and  resolve  crossovers 
owing  to  low  SNP  density  or  low  differentiation  between  West 
Africans  and  Europeans.  Third,  the  map  may  be  biased  where  ancestry 
deviates  from  the  average,  for  example  at  chromosome  8q24,  where 
the  10%  of  the  people  in  this  study  who  have  prostate  cancer  have  an 
increased  proportion  of  African  ancestry19.  Fourth,  the  map  assumes 
that  all  individuals  are  unrelated,  whereas  in  fact  there  is  probably 
some  shared  ancestry,  resulting  in  multiple  counting  of  some  cross¬ 
overs  and  an  overestimation  of  map  precision. 

To  assess  the  accuracy  of  the  AA  map,  we  generated  an  independ¬ 
ent  African-American  pedigree  map  by  analysing  222  nuclear  families 
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Figure  1  |  Building  an  African-American  genetic  map.  a,  HAPMIX  detection 
of  crossovers  between  segments  of  inferred  ancestry  is  illustrated  in  a  father- 
mother-child  trio.  Black  segments  show  inferred  crossovers;  arrows  show 
transmission  of  ancestral  crossovers  from  parent  to  child;  purple/green  segments 
show  de  novo  events  (paternal/maternal  origin,  respectively)  corresponding  to 


events  identified  directly  using  two  additional  children  (bottom,  ‘pedigree 
inferred’),  b,  The  AA  map  localizes  five  hotspots  in  a  region  of  the  MHC  whose 
positions  (blue)  were  previously  mapped  by  sperm  typing1,  c,  Comparison  of 
maps  shows  a  hotspot  at  33.1  Mb  in  the  African-derived  AA  and  YRI  maps,  but 
not  the  deCODE  and  CEU  maps  (all  maps  smoothed  to  10  kb). 


that  included  1,056  meioses  in  which  we  could  directly  detect  cross¬ 
overs  between  parent  and  child  (Methods;  Fig.  la).  Examination  of 
the  AA  map  rate  around  directly  detected  crossovers  confirms  the 
high  resolution:  the  rate  around  such  crossovers  shows  at  least  as 
strong  a  peak  as  that  observed  in  maps  based  on  linkage  disequilib- 
rium2,3,18  (Supplementary  Fig.  3).  We  next  computed  correlation  co¬ 
efficients  for  both  the  AA  map  and  the  deCODE  map4  to  maps  derived 
from  the  breakdown  of  linkage  disequilibrium  in  Europeans  (CEU) 
and  West  Africans  (YRI)18.  At  broad  scales  (>3  Mb)  they  are  almost 
identical  (p  >  0.97;  Table  1).  At  fine  scales,  the  AA  map  is  more 
accurate  (Table  1  and  Supplementary  Table  1),  as  reflected  in  a  modest 
improvement  in  correlation  to  the  CEU  map  at  a  3-kb  scale 
(Paa,ceu  =  0.66  versus  PdecoDE.CEU  =  0.58),  and  a  major  improve¬ 
ment  for  the  YRI  map,  also  at  a  3-kb  scale  (Paa.yri  =  0.71  versus 
PdecoDE.YRi  =  0.53).  The  deCODE  map  is  more  correlated  to  the 
CEU  map  than  to  the  YRI  map  at  scales  <1  Mb,  suggesting  that  this 
map,  built  in  Icelanders,  reflects  more  European  recombination  rates. 
The  AA  map  shows  the  opposite  pattern,  suggesting  that  it  reflects 
more  West  African  recombination  patterns. 


Population  differences  in  hotspot  locations 

We  compared  the  rate  estimates  for  all  four  maps  (AA,  deCODE,  CEU 
and  YRI)  over  a  200-kb  region  within  the  major  histocompatibility 
complex  (MHC)  locus  where  recombination  rates  in  European  males 
have  been  characterized  through  sperm  typing1  (Fig.  lb).  The  AA  map 
detects  five  of  six  known  hotspots,  and  localizes  them  to  within  1  kb  (the 
sixth  hotspot  is  weak,  with  a  peak  male  rate  below  the  genome  average1) . 
Notably,  the  two  maps  based  on  samples  with  African  ancestry  ( AA  and 
YRI)  found  a  hotspot  not  present  in  either  map  based  on  samples  of 
European  ancestry  (deCODE  and  CEU)  (Fig.  lc;  Supplementary  Fig.  4 
gives  a  second  example).  We  confirmed  that  such  ‘African-enriched’ 
hotspots  also  occur  genome-wide,  by  examining  2,375  loci  with  recom¬ 
bination  rate  peaks  in  the  YRI  map  (>5  cM  Mb  4)  but  not  the  CEU 
map  (<lcMMb  *)>  and  finding  a  rate  rise  in  the  independently 
generated  AA  map,  but  not  in  the  deCODE  map  (Supplementary 
Fig.  5A).  In  the  reciprocal  experiment  searching  for  European-specific 
hotspots,  we  find  no  such  evidence  for  genuine  ancestry  specificity;  at 
loci  with  recombination  rate  peaks  in  the  CEU  map  but  not  the  YRI 
map,  there  are  weak  peaks  in  both  the  deCODE  and  AA  maps 


Table  1  |  Genetic  map  assessments  at  different  size  scales 

Scale  (interval  size)  Pearson  correlation  (p)  of  the  AA  map  (deCODE  map)  to  the  Estimated  correlation  of  AA  map  to  Estimated  coefficient  of  variation  of  AA  map  (s.e. 

specified  LD  map  the  true  map  (inferred  by  MCMC)t  divided  by  crossover  rate  expected  for  interval  size)f 


Combined  LD* 

CEU 

YRI 

3  kb 

0.75  (0.63) 

0.66  (0.58) 

0.71  (0.53) 

0.93 

1.41 

10  kb 

0.82  (0.74) 

0.73  (0.70) 

0.78  (0.65) 

0.96 

0.73 

30  kb 

0.86  (0.83) 

0.78  (0.78) 

0.83  (0.74) 

0.98 

0.36 

100  kb 

0.91  (0.89) 

0.84  (0.85) 

0.87  (0.81) 

0.99 

0.17 

300  kb 

0.94  (0.93) 

0.89  (0.90) 

0.92  (0.88) 

1.00 

0.08 

1  Mb 

0.97  (0.96) 

0.94  (0.94) 

0.95  (0.95) 

1.00 

0.04 

3  Mb 

0.98  (0.98) 

0.97  (0.97) 

0.98  (0.97) 

1.00 

0.02 

The  numbers  in  this  table  are  restricted  to  the  autosomes  and  genomic  segments  more  than  5  Mb  from  the  telomeres.  LD,  linkage  disequilibrium;  s.e.,  standard  error. 
*The  combined  map  is  the  HapMap2  population-averaged  linkage-disequilibrium-based  map18. 

|The  s.e.  of  the  map  at  each  size  scale  is  determined  by  the  posterior  probability  distribution  from  the  MCMC. 
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(Methods  and  Supplementary  Fig.  5B).  Thus,  hotspots  active  in 
Europeans  are  consistendy  ‘shared’  with  YRI  and  African  Americans, 
whereas  populations  with  African  ancestry  harbour  additional,  non- 
shared  hotspots  that  we  call  ‘African-enriched’. 

Mapping  variants  underlying  population  differences 

To  understand  the  features  of  recombination  in  West  Africans  that 
differ  from  Europeans,  we  estimated  the  degree  to  which  each 
African-American  person’s  crossovers  occur  in  African-enriched  hot¬ 
spots,  compared  with  shared  hotpots,  a  phenotype  we  refer  to  as  their 
African  enrichment  (AE).  We  view  each  individual’s  crossovers  as 
sampled  from  a  mixture  of  two  genetic  maps — an  ‘S  map’  of  shared 
hotspots  based  on  the  deCODE  map,  and  an  ‘AE  map’  of  African- 
enriched  hotspots  that  is  learned  from  comparing  the  deCODE  and 
AA  maps — so  that  the  proportion  of  crossovers  assigned  to  the  AE 
map  is  a  person’s  AE  phenotype  (Supplementary  Note  4).  We  tested 
approximately  3  million  SNPs  (genotyped  and  imputed)  for  asso¬ 
ciation  with  three  phenotypes:  AE,  usage  of  linkage-disequilibrium- 
based  hotspots  known  to  be  enriched  for  the  13-bp  motif 


CCNCCNTNNCCNC20  and  genome-wide  crossover  rate  (in  pedigrees) 
(Methods  and  Supplementary  Note  4).  In  crossovers  detected  in  un¬ 
related  African  Americans,  the  alleles  a  person  carries  are  only 
sometimes  descended  from  the  ancestor  in  whom  the  crossover 
occurred,  thus  adding  noise  to  the  association  signal  (nevertheless 
there  is  useful  signal  given  the  large  sample  size;  Supplementary 
Note  4).  In  the  pedigree  map,  association  between  alleles  and  AE  can 
be  tested  directly  because  we  have  genotypes  in  the  parents. 

The  SNP  showing  the  strongest  association  with  AE  is  rs6889665 
(P  =  1.5  X  10  246;  Fig.  2a  and  Supplementary  Fig.  6),  which  has  a 
derived  allele  frequency  of  29%  in  YRI  and  2%  in  CEU,  and  is  within 
4  kb  of  the  ZF  array  of  PRDM9  (refs  4,  9,  11-13).  This  SNP  is  asso¬ 
ciated  with  AE  in  both  the  pedigree  individuals  and  the  unrelated 
individuals  (Supplementary  Note  4),  and  is  also  the  SNP  most 
strongly  associated  with  usage  of  linkage-disequilibrium-based  hot¬ 
spots  (P  =  1.8  X  10  52)  (Supplementary  Table  2).  No  locus  outside 
PRDM9  is  significant  (P<0.01  after  Bonferroni  correction;  Sup¬ 
plementary  Table  2).  To  understand  better  the  association  at 
rs6889665,  we  inferred  the  alleles  in  the  PRDM9  ZF  array  carried 
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Figure  2  |  Association  of  PRDM9  genetic  variation  with  hotspot  activity. 

a,  A  genome-wide  association  study  measuring  association  of  the  AE 
phenotype  shows  a  single  genome-wide  significant  peak  at  PRDM9,  with 
rs6889665  the  best-associated  SNP.  b,  Relationship  between  alleles  of 
rs6889665  and  predicted  binding  target  of  the  PRDM9  ZF  array9  for  West 
African  and  European  samples.  The  binding  predictions  are  grouped  into  8 


clusters  according  to  their  best-matching  region  to  the  13 -bp  motif,  and 
annotated  by  the  number  of  bases  matching  the  motif.  The  African- enriched 
rs6889665  C  allele  always  co-occurs  with  motifs  with  a  poor  (5/8)  match  to  the 
13 -bp  motif,  c,  Gene  tree25  of  the  linkage  disequilibrium  block  containing  the 
PRDM9  ZF  array  (Methods);  numbered  circles  show  SNPs  and  significant  P 
values  for  association,  after  conditioning  on  rs6889665. 
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by  139  individuals  based  on  sequencing  data  from  the  1000  Genomes 
Project10,  using  the  reads  to  infer  each  individual’s  PRDM9  alleles 
among  29  alleles  whose  full  sequences  were  previously  determined9 
(Supplementary  Note  5).  Grouping  PRDM9  alleles  on  the  basis  of  how 
closely  their  binding  target  predictions  match  the  8  non-degenerate 
bases  of  the  13-bp  motif,  following  a  previously  described  approach9, 
we  find  that  the  ancestral ‘T’  variant  at  rs6889665  is  strongly  corre¬ 
lated  to  alleles  with  an  exact  (8/8)  match  to  the  13-bp  motif  (including 
the  A  and  B  alleles),  whereas  the  derived  ‘C’  variant  is  almost  perfectly 
correlated  to  a  group  of  alleles,  all  predicted  to  bind  a  common, 
different  17-bp  motif— CCgCNgtNNNCgtNNCC9 — which  matches 
the  13-bp  motif  at  only  5  bases  (5/8  match;  less  strongly  signalled 
bases  in  the  motif  are  in  lowercase  and  ‘N’  may  be  any  base).  This 
implies  a  common  historical  origin  for  alleles  matching  this  17-bp 
motif  (Fig.  2b,  Supplementary  Fig.  7  and  Supplementary  Note  5).  We 
also  experimentally  measured  the  number  of  ZF  domains  in  PRDM9 
in  354  individuals  including  166  African  Americans  from  the  pedigree 
study  (Methods).  This  showed,  again,  that  rs6889665  differentiates 
PRDM9  alleles  into  two  different  classes,  with  96%  of  haplotypes 
carrying  the  ancestral  allele  having  <14  ZFs,  and  93%  of  haplotypes 
carrying  the  derived  allele  havings  14  ZFs  (Supplementary  Fig.  7). 
After  conditioning  on  rs6889665,  there  is  no  evidence  that  ZF  array 
length  is  associated  with  the  AE  phenotype.  Several  SNPs  near  the 
PRDM9  ZF  array  show  a  conditional  association  signal  that  is  much 
weaker  than  rs6889665,  but  still  significant  (Fig.  2c,  Supplementary 
Fig.  6  and  Supplementary  Note  4),  with  the  strongest  at  rsl0043097 
(P  =  8.3  X  10~14),  upstream  of  the  PRDM9  transcription  start  site. 
These  SNPs  may  tag  additional  variation  in  the  PRDM9  ZF  array,  or 
potentially  expression  levels. 

Finding  a  motif  for  African-enriched  hotspots 

To  identify  directly  candidate  African-enriched  hotspot  motifs,  we 
selected  2,454  loci  with  a  high  crossover  rate  in  the  AE  map  and 
YRI  map  (>2  cM  Mb  1  over  2  kb),  and  no  more  than  half  this  rate 
in  the  S  map  and  CEU  map  (this  set  is  more  powerfully  enriched  for 
higher  recombination  in  people  of  African  ancestry  than  the  2,375 
above,  as  it  includes  information  from  the  contemporary  maps).  We 
compared  these  to  a  ‘control  set’  of  7,328  candidate  hotspots  more 
active  in  the  European-  than  the  African-derived  maps  (Methods  and 
Supplementary  Note  6).  To  identify  sequence  motifs  associated  with 
the  African- enriched  hotspots3'21,  we  identified  short  motifs  that 


CCgC  gt  Cgt  CC 


Distance  from  motif  (kb) 

Figure  3  |  A  sequence  motif  specifying  the  positions  of  African-enriched 
hotspots,  a,  Logo  plot  showing  a  degenerate  17-bp  hotspot  motif,  with  stack 
height  proportional  to  —log  P  value,  and  relative  letter  height  proportional  to 
the  mean  crossover  rate  increase  given  each  base.  Below  is  the  bioinformatic 
PRDM9  binding  prediction  for  the  alleles  associated  with  rs6889665  allele  C 
(from  Fig.  2b),  matching  this  motif  at  10/11  bases  (lines),  b,  Average  crossover 


occurred  at  increased  frequency  in  the  African-enriched  hotspot  set 
(Supplementary  Note  6).  Testing  all  motifs  with  lengths  of  5-9  bases 
revealed  a  9-nucleotide  motif  CCCCAGTGA  (odds  ratio  (OR)  =  1 .79, 
P  =  2.24  X  1(G8,  Bonferroni  corrected  P  =  0.004),  which  exhibited  a 
kilobase-scale  rate  peak  near  occurrences  of  this  motif  in  African- 
derived  maps,  but  in  neither  of  the  European-derived  maps  (Sup¬ 
plementary  Fig.  8).  Further  analysis  revealed  a  strong  influence  of 
downstream  flanking  bases  (Supplementary  Fig.  9)  and  degeneracy, 
yielding  a  17-bp  consensus  sequence,  CCCCaGTGAGCGTtgCc 
(Fig.  3a;  more  strongly  signalled  bases  are  in  uppercase),  with  the 
same  consensus  obtained  when  we  considered  flanking  sequences 
for  only  odd  or  even  chromosomes,  and  whether  we  based  the  analysis 
on  AE-S  or  YRI-CEU  map  comparisons  (Supplementary  Note  6). 
The  500  best  matches  to  this  motif  have  a  ~3-fold  increase  in  average 
rate  in  the  AA  and  YRI  relative  to  the  deCODE  and  CEU  maps  (Fig.  3b 
and  Supplementary  Fig.  8G).  Hotspots  associated  with  the  motif  occur 
in  both  unique  and  repetitive  DNA  (for  example,  L1PA10/13  LINE 
elements;  Supplementary  Fig.  10  and  Supplementary  Note  6).  We  also 
compared  the  17-bp  consensus  to  the  binding  motif  predicted  for  5/8 
match  alleles,  and  found  that  they  match  almost  precisely  (Fig.  3a;  10 
of  11  bases,  P  =  8.1  X  10  6). 

Assessing  the  impact  of  PRDM9  on  recombination 

How  much  of  the  African-enriched  recombination  pattern  can  be 
explained  by  PRDM9 ?  We  estimated  the  fraction  of  variation  in  the 
AE  phenotype  explained  by  rs6889665  in  our  pedigree  data  after 
accounting  for  noise  in  the  phenotype  estimation  (Supplementary 
Note  4).  Over  82%  of  map  usage  variability  is  explained  by  the 
rs6889665  genotype  alone.  Given  that  there  are  further  influential 
PRDM9  variants  (Fig.  2c),  this  gene  may  thus  explain  almost  all  dif¬ 
ferences  in  local  rate  between  the  West  African  and  European  popu¬ 
lations.  We  next  examined  rates  around  82  narrowly  defined 
(<10  kb)  crossover  sites  in  7  individuals  homozygous  for  the  derived 
allele  at  rs6889665.  There  is  no  evidence  of  hotspots  at  these  loci  in 
either  the  deCODE  or  CEU  maps  (Fig.  3c),  in  contrast  to  crossovers  in 
individuals  carrying  the  ancestral  allele  at  rs6889665  (Supplementary 
Fig.  11).  Thus,  crossover  positions  in  individuals  who  are  homozygous 
for  the  derived  allele  at  rs6889665  are  consistent  with  an  entirely  dif¬ 
ferent  recombination  hotspot  landscape,  which  would  imply  PRDM9 
control  of  all  hotspots9.  Despite  the  strong  correlation  between  maps  at 
megabase  scales,  there  is  mounting  evidence  that  PRDM9’ s  influence 


Rates  around  crossovers  in  CC  individuals 


rate  (in  2-kb  sliding  windows)  in  the  AA  (red  line)  and  deCODE  (black  line) 
maps  surrounding  the  500  strongest  motif  matches,  c,  In  seven  rs6889665  CC 
individuals  from  the  pedigree  study,  we  localized  82  crossovers  to  within  10  kb, 
and  plot  average  AA,  YRI,  deCODE  and  CEU  map  rates.  There  is  no  strong 
peak  above  local  background  in  the  deCODE  or  CEU  maps. 
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on  crossing  over  may  not  be  limited  to  fine  scales4,11:  we  observe  a 
weakly  significant  association  of  rs6889665  with  the  total  number  of 
crossovers  genome- wide  in  pedigrees  (P  =  0.04),  corresponding  to  an 
average  1.3  crossovers  more  per  meiosis  per  derived  allele,  exceeding 
the  strongest  previously  known  association22  at  RNF212. 

Conclusions 

We  have  shown  that  PRDM9  alleles  that  bind  a  novel  17-bp  motif  and 
occur  at  greatly  increased  frequency  in  people  of  West  African  ancestry 
have  led  to  a  shift  in  the  recombination  landscape  compared  with 
people  of  non- African  ancestry.  The  larger  number  of  hotspots  avail¬ 
able  to  West  Africans  implies  that  at  the  population  level,  crossovers 
are  more  evenly  distributed  than  in  Europeans10,  and  thus  the  shorter 
extent  of  West  African  linkage  disequilibrium  is  not  due  to  differences 
in  demographic  history  alone  (such  as  the  lack  of  an  out-of-  Africa 
founder  event)23.  Our  findings  also  have  medical  implications,  as 
recombination  errors  leading  to  insertions  or  deletions  are  known  to 
be  associated  with  recombination  hotspots9,21,24.  Our  results  predict 
that  the  congenital  abnormalities  that  have  been  associated  with  the 
recombination  hotpots  bound  by  PRDM9  A  and  B  alleles  will  occur  at  a 
decreased  rate  in  people  of  West  African  ancestry,  whereas  new  dis¬ 
eases  will  arise  due  to  recombination  errors  near  African-enriched 
hotspots. 

METHODS  SUMMARY 

We  assembled  SNP  array  data  from  29,589  unrelated  people  and  222  nuclear 
families  genotyped  at  490,000-9 10,000  SNPs  from  the  Candidate  Gene  Association 
Resource  (CARe),  studies  at  the  Children’s  Hospital  of  Philadelphia  (CHOP),  the 
African  American  Breast  Cancer  Consortium,  the  African  American  Prostate 
Cancer  Consortium  and  the  African  American  Lung  Cancer  Consortium.  To  build 
a  recombination  map,  we  used  HAPMIX  to  localize  candidate  crossover  positions15, 
and  implemented  a  MCMC  that  used  the  probability  distributions  for  the  positions 
of  the  filtered  crossovers  to  infer  recombination  rates  for  each  of  1.3  million  inter- 
SNP  intervals.  We  also  implemented  a  second  MCMC  that  models  each  individual’s 
set  of  crossovers  as  a  mixture  of  an  S  map,  similar  to  the  European  deCODE  map, 
and  an  AE  map,  and  then  assigned  each  individual  an  ‘AE  phenotype’  correspond¬ 
ing  to  the  proportion  of  their  newly  detected  crossovers  assigned  to  the  AE  map.  We 
imputed  genotypes  at  up  to  three  million  HapMap2  SNPs18  and  then  tested  each  of 
these  SNPs  for  association  with  the  AE  phenotype  and  other  recombination- 
related  phenotypes.  We  identified  2,454  candidate  African-enriched  hotspots  with 
increased  recombination  rates  in  the  YRI  versus  CEU  maps,  and  in  the  AE  versus  S 
maps,  and  searched  for  motifs  enriched  at  these  loci,  thus  identifying  a  degenerate 
17-bp  motif.  To  study  the  structure  of  PRDM9,  we  measured  the  length  of  the 
PRDM9  ZF  array  and  genotyped  rs6889665  in  YRI,  CEU  and  the  CARe  nuclear 
families;  we  also  carried  out  imputation  based  on  1000  Genomes  Project  short  read 
data10  to  infer  the  alleles  individuals  carry,  among  29  previously  characterized  in  a 
sequencing  study  of  PRDM9  (ref.  9). 

Full  Methods  and  any  associated  references  are  available  in  the  online  version  of 
the  paper  at  www.nature.com/nature. 
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Samples  used  for  building  the  AA  map.  The  29,589  unrelated  African-American 
samples  derive  from  five  sources.  Informed  consent  was  provided  by  all  the 
individuals  participating  in  the  study,  and  was  approved  by  all  of  the  institutions 
responsible  for  sample  collection. 

The  first  source  is  the  Candidate  Gene  Association  Resource  (CARe)  study,  a 
consortium  of  cohorts.  We  analysed  CARe  samples  genotyped  on  the  Affymetrix 
6.0  array  from  the  Atherosclerosis  Risk  in  Communities  study  (ARIC),  the 
Cleveland  Family  Study  (CFS),  the  Coronary  Artery  Risk  Development  in 
Young  Adults  study  (CARDIA),  the  Jackson  Heart  Study  (JHS)  and  the  Multi- 
Ethnic  Study  of  Atherosclerosis  (MESA).  After  removing  individuals  known  to  be 
related,  and  restricting  to  SNPs  with  good  completeness  in  all  cohorts,  we  had 
data  from  6,209  individuals  typed  at  580,000  SNPs. 

The  second  source  consists  of  diverse  studies  carried  out  at  the  Children’s 
Hospital  of  Philadelphia  (CHOP),  which  has  established  a  biobank  for 
Philadelphia  children  to  facilitate  large  genotype-phenotype  association  analysis. 
The  cohort  was  recruited  by  CHOP  clinicians,  nursing  and  medical  assistant  staff 
within  the  CHOP  Health  Care  Network,  including  primary  care  clinics  and  out¬ 
patient  practices,  from  the  hospital’s  patient  base  of  over  one  million  paediatric 
patients.  All  samples  analysed  here  were  genotyped  on  either  the  Illumina  610- 
Quad  or  Illumina  HumanHap550  array.  After  removing  individuals  known  to  be 
related,  identifying  American  Americans  by  multidimensional  scaling  on  geno¬ 
type  data,  and  restricting  to  SNPs  with  a  high  level  of  completeness  across  sam¬ 
ples,  we  had  data  from  7,503  samples  typed  at  491,572  SNPs. 

The  third  source  is  the  African  American  Breast  Cancer  Consortium 
(AABCC),  consisting  of  the  Multiethnic  Cohort  study  (MEC),  the  Los  Angeles 
component  of  the  Women’s  Contraceptive  and  Reproductive  Experiences  study 
(CARE),  the  Women’s  Circle  of  Health  Study  (WCHS),  the  San  Francisco  Bay 
Area  Breast  Cancer  study  (SFBC),  the  Carolina  Breast  Cancer  Study  (CBCS),  the 
Prostate,  Lung,  Colorectal  and  Ovarian  Cancer  Screening  Trial  Cohort  (PLCO), 
the  Nashville  Breast  Health  Study  (NBHS)  and  the  Wake  Forest  University  Breast 
Cancer  Study  (WFBC),  all  genotyped  on  an  Illumina  1M  array.  After  data  cura- 
tion,  including  removal  of  samples  with  genetic  evidence  of  being  second- degree 
relatives  or  closer  using  the  smartrel  package  of  EIGENSOFT26  (>0.2  correlation 
of  genotype  state),  we  had  data  from  5,203  women  (about  half  cases  and  half 
controls)  typed  at  894,717  SNPs. 

The  fourth  source  is  the  African  American  Prostate  Cancer  Consortium 
(AAPCC),  consisting  of  the  MEC,  the  Southern  Community  Cohort  Study 
(SCCS),  PLCO,  the  Cancer  Prevention  Study  II  Nutrition  Cohort  (CPS-II),  the 
Prostate  Cancer  Case-Control  Studies  at  MD  Anderson  (MDA),  the  Identifying 
Prostate  Cancer  Genes  study  (IPCG),  the  Los  Angeles  Study  of  Aggressive  Prostate 
Cancer  (LAAPC),  the  Prostate  Cancer  Genetics  Study  (CaP  Genes),  the  Case- 
Control  Study  of  Prostate  Cancer  among  African  Americans  in  Washington  DC 
(DCPC),  the  Gene-Environment  Interaction  in  Prostate  Cancer  Study  (GECAP) 
and  the  Cancer  Prevention  Study  II  (CPS-II),  all  typed  on  an  Illumina  1M  array. 
After  the  same  data  curation  as  the  breast  cancer  study,  we  had  data  from  6,540 
men  (about  half  cases  and  half  controls)  typed  at  896,036  SNPs. 

The  fifth  source  is  individuals  from  the  African  American  Lung  Cancer 
Consortium  (AALCC),  including  cases  and  controls  from  the  MEC,  the  SCCS, 
PLCO,  the  MD  Anderson  (MDA)  African  American  Lung  Cancer  Study,  the 
NCI-Maryland  Lung  Cancer  Case-Control  Study,  the  University  of  California 
at  San  Francisco  African  American  Lung  Cancer  Study  and  the  Wayne  State 
African  American  Lung  Cancer  Study,  all  genotyped  on  the  Illumina  1M  array. 
After  data  curation,  we  had  data  from  4,134  individuals  typed  at  906,687  SNPs. 
Samples  used  for  building  the  pedigree  map.  The  pedigree  map  was  built  using 
data  from  135  African-American  nuclear  families  from  CARe  and  87  African- 
American  families  from  CHOP  for  which  genotyping  data  were  available  from  at 
least  two  full  siblings  and  at  least  one  parent.  The  CARe  studies  that  contributed 
samples  were  JHS  (70  families,  including  58  samples  that  we  newly  genotyped  on 
the  Affymetrix  6.0  array  to  increase  the  number  of  crossovers  we  could  analyse) 
and  CFS  (65  families).  For  the  families  with  a  missing  parent,  we  developed  a 
Hidden  Markov  Model  (HMM)  approach  to  jointly  estimate  the  genotype  of  the 
missing  parent  as  well  as  to  infer  the  position  of  crossover  events  in  the  offspring. 
The  observed  variables  in  the  HMM  were  the  genotypes  of  the  available  family 
members  and  the  states  of  the  HMM  were  the  genotypes  of  the  parents  and  the 
identity  by  descent  (IBD)  status  of  the  children.  A  change  in  IBD  status  in  an 
offspring  is  interpreted  as  a  crossover  event.  Supplementary  Note  2  provides 
details  of  the  HMM  used  to  infer  positions  of  these  pedigree  crossover  events. 
Local  ancestry  inference  and  identification  of  crossover  events.  We  merged  the 
data  for  each  cohort  with  phased  YRI  and  CEU  data  from  the  HapMap3  data  set27. 
We  filtered  SNPs  that  had  a  frequency  inconsistent  with  an  80-20%  linear  com¬ 
bination  of  YRI  and  CEU  frequencies  (f  statistic  with  an  absolute  value  of  greater 


than  3),  potentially  reflecting  genotyping  error  in  either  the  HapMap3  or  the 
cohort  data. 

We  ran  HAPMIX  on  these  data  using  a  prior  hypothesis  of  20%  European 
ancestry  and  6  generations  since  mixture  for  each  individual15.  HAPMIX  requires 
users  to  input  a  recombination  map  as  a  prior  distribution,  and  we  assumed  that 
rates  were  constant  across  each  chromosome  arm  with  a  total  rate  across  each  arm 
determined  by  the  Rutgers  genetic  map6  (Supplementary  Note  1). 

Filtering  of  crossover  events  had  three  stages.  First,  we  removed  crossover 
events  where  the  probability  of  occurrence  was  estimated  to  be  less  than  95% 
by  HAPMIX.  Second,  we  removed  candidate  crossover  events  that  were  non¬ 
monotonic,  that  is,  where  the  probability  of  an  overlapping  crossover  event  with 
an  ancestry  switch  in  a  different  direction  was  >1%  within  any  inter- SNP  inter¬ 
val.  Third,  we  removed  crossover  events  where  either  of  the  two  flanking  ancestry 
blocks  was  smaller  than  2  cM  in  size  as  measured  with  respect  to  a  published  map 
based  on  linkage  disequilibrium3,18  (Supplementary  Note  1).  For  comparisons  to 
the  deCODE  map  and  linkage-disequilibrium-based  maps,  we  also  removed 
segments  of  the  genome  within  5  Mb  of  the  telomeres  (to  be  consistent  with 
the  comparisons  presented  in  the  deCODE  study  where  the  same  restriction 
was  applied4). 

Construction  of  the  AA  map.  All  22  autosomes  and  chromosome  X  were  split 
into  approximately  1.3  million  inter-SNP  intervals  based  on  the  union  of  SNPs 
analysed  across  all  five  sample  sets.  Our  goal  was  to  estimate  a  crossover  rate  for 
each  of  these  intervals.  We  modelled  crossover  rates  such  that  the  rate  for  each 
SNP  interval  is  independent  of  every  other  SNP  interval,  motivated  by  a  hotspot 
model.  We  used  a  gamma  prior  on  rates  with  the  mean  estimated  from  the  filtered 
HAPMIX  output  (Supplementary  Note  1).  We  used  a  Gibbs  sampler  to  sample 
rates  in  every  SNP  interval  and  to  determine  the  location  of  a  crossover  event 
within  the  95%  range  estimated  by  the  HAPMIX  output.  In  each  round  of  the 
Gibbs  sampler,  we  used  the  set  of  sampled  rates  in  the  previous  round  to  construct 
a  probability  mass  function  for  the  SNP  interval  in  which  each  crossover 
occurred,  using  an  approach  described  in  Supplementary  Note  1  to  approximate 
the  probability  mass  function  that  HAPMIX  would  have  produced  conditional  on 
the  previous  set  of  sampled  rates.  After  sampling  the  location  of  the  crossover 
events,  we  counted  how  many  crossovers  occurred  in  every  SNP  interval.  We  used 
these  counts  to  construct  a  posterior  distribution  for  the  crossover  rate  in  each 
SNP  interval,  taking  advantage  of  the  conjugacy  of  a  Poisson  likelihood  and  a 
gamma  prior.  We  then  sampled  a  crossover  rate  for  each  SNP  interval  from  its 
respective  gamma  posterior  distribution. 

Candidate  African-enriched  hotspots.  To  identify  candidate  African-enriched 
hotspots,  we  used  two  pairs  of  maps:  the  previously  available  YRI  map  and  CEU 
map,  and  the  AE  map  and  the  S  map.  We  combined  information  from  both  map 
pairs  to  enrich  for  regions  with  genuine  differences  between  the  West  African  and 
European  populations.  Specifically,  we  identified  candidate  hotspots  as  2 -kb 
intervals  representing  a  peak  in  the  AE  map  rate,  where  the  estimated  rate  in 
the  AE  map  was  >2cMMb_1  and  at  least  double  that  in  the  S  map,  and  in 
addition  the  YRI  map  rate  was  >2cMMb_1  and  at  least  double  the  CEU  map 
rate.  We  took  the  resulting  candidate  hotspot  set  and  defined  hotspot  boundaries 
by  identifying  the  region  flanking  the  2  kb  rate  peak  that  had  rates  at  least  50%  of 
the  peak  value  in  the  AE  map.  Regions  larger  than  5  kb  were  discarded.  We 
similarly  constructed  a  set  of  ‘shared’  hotspots  but  modified  the  initial  criteria 
given  the  lack  of  obvious  hotspots  present  only  in  people  of  European  ancestry. 
Specifically,  we  identified  2  kb  S  map  rate  peak  locations  where  both  the  S  and 
CEU  estimated  rates  were  >2  cMMb-1,  while  the  AE  and  YRI  map  rates  were 
below  those  in  these  respective  European  populations.  We  then  narrowed  the 
regions  and  filtered  using  the  same  procedure  we  had  developed  for  the  candidate 
African-enriched  hotspots. 

Association  testing.  MaCH28  was  used  to  impute  up  to  3,058,149  SNP  genotypes 
from  HapMap2  (ref.  18)  into  all  African  Americans  we  analysed,  using  the  un¬ 
related  YRI  and  CEU  samples  as  combined  reference  panels.  We  tested  for  asso¬ 
ciation  at  all  SNPs  with  minor  allele  frequency  >  1%.  To  restrict  our  analysis  to 
individuals  in  whom  the  phenotype  was  measured  accurately,  we  performed  the 
association  analysis  with  the  AE  and  hotspot  usage  phenotypes  only  in  indivi¬ 
duals  with  at  least  35  inferred  crossovers.  Association  testing  was  carried  out 
using  linear  regression,  after  controlling  for  gender,  genome-wide  European 
ancestry  proportion  (inferred  by  HAPMIX)  and  study  (Supplementary  Note 
4).  We  observe  slight  inflation  of  the  association  statistics  genome-wide  com¬ 
pared  with  the  expectation  (the  Genomic  Control  inflation  factor29  is  1.046  for  the 
AE  phenotype  and  1.038  for  the  hotspot  usage  phenotype),  which  we  propose 
may  reflect  cryptic  relatedness  among  samples  (Supplementary  Note  4).  We 
report  P  values  after  correction  using  Genomic  Control29. 

Construction  of  PRDM9  tree.  To  examine  the  history  of  the  PRDM9  ZF  array 
and  to  place  SNPs  showing  association  with  AE  map  usage  within  the  framework 
of  this  history,  we  identified  19  SNPs  from  HapMap2  (ref.  18)  that  surrounded  the 
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ZF  array  and  that  form  a  maximal  block  of  SNPs  where  there  is  almost  no 
evidence  of  recombination:  \D'\  —  1  for  all  pairs  of  SNPs  in  the  data  after  removing 
2  of  120  YRI  and  1  of  120  CEU  haplotypes  (the  chimpanzee  genome  was  used  to 
define  the  ancestral  alleles).  A  unique  ‘gene  tree’  was  then  built,  and  we  used 
genetree25,  which  assumes  a  coalescent  prior  on  genealogies,  to  approximately  infer 
ages  for  these  mutations  conditional  on  the  data  (a  caveat  is  that  the  tree  building 
does  not  account  for  the  HapMap  SNP  ascertainment  scheme).  Because  genetree 
assumes  a  randomly  mating  population,  and  the  YRI  represent  almost  all  the 
HapMap  haplotype  diversity  in  this  region,  we  ran  the  software  (2,000,000  import¬ 
ance  samples,  otherwise  default  parameters)  on  the  YRI  data  only  and  used  this  to 
construct  Fig.  2c.  Each  node  of  the  tree  corresponds  to  a  unique  haplotype  at  these 
19  SNPs,  whose  frequency  in  both  CEU  and  YRI  is  shown  at  the  base  of  the  figure. 
Motif  searching.  We  tested  all  candidate  motifs  of  5  to  9  base  pairs  for  enrich¬ 
ment  in  our  African-enriched  hotspot  set  relative  to  our  shared  hotspot  set.  We 
counted  occurrences  of  all  tested  motifs  in  repeat  and  non-repeat  backgrounds 
separately,  and  computed  a  separate  P  value  for  each  genomic  background  with  a 
chi- squared  test,  based  on  a  contingency  table  that  compares  the  counts  of  a 
particular  motif  to  the  counts  of  all  motifs  of  that  size.  We  converted  each  P  value 
to  a  Z  score,  added  the  scores  on  each  background,  and  then  obtained  a  corres¬ 
ponding  combined  P  value.  Motifs  were  considered  statistically  significant  only  if 
they  passed  four  stringent  criteria:  (1)  they  were  statistically  significant  after 
Bonferroni  correction  for  the  number  of  motifs  tested;  (2)  they  were  overrepre¬ 
sented  in  the  African-enriched  set;  (3)  they  were  statistically  significant  on  both 
the  repeat  and  non-repeat  backgrounds  (P<  0.01)  independently;  and  (4)  they 
were  statistically  significant  when  the  joint  P  value  was  calculated  only  by  com¬ 
paring  the  frequency  of  the  motif  to  other  motifs  of  identical  G/C  content  (to 
eliminate  false  positives  due  to  any  difference  in  G/C  content  between  the  hotspot 
sets).  This  testing  revealed  a  unique  significant  motif,  the  9-nucleotide  oligomer 
CCCCAGTGA.  We  explored  whether  flanking  DNA  around  exact  matches  to 
this  motif  also  had  a  role  by  testing  whether  bases  at  a  given  site  relative  to  the 
motif  were  associated  with  the  difference  in  rates  between  African-  and 
European-ancestry  populations  (Kruskal- Wallis  test).  Rates  were  evaluated  in 
the  2  kb  surrounding  each  motif  occurrence.  We  separately  evaluated  flanking 
sequence  using  both  the  difference  between  YRI/CEU  map  rates,  and  the  differ¬ 
ence  between  the  AE/S  map  rates,  leading  to  the  identification  of  the  17-bp 


consensus  African-enriched  motif  (Supplementary  Note  6  has  full  details).  To 
identify  close  matches  to  this  17-bp  motif  among  all  matches  to  the  9 -bp  motif  in 
the  genome,  for  every  occurrence  of  the  9 -bp  motif,  we  scored  the  flanking 
sequence  bases  proportionately  to  the  relative  increase  in  average  crossover  rate 
difference  associated  with  each  base,  then  multiplied  across  bases  in  the  17-mer 
region  to  provide  an  overall  score.  We  ranked  occurrences  according  to  this  score, 
and  plotted  rates  around  the  top  500  (Fig.  3b).  We  verified  these  findings  by 
measuring  average  crossover  differences  for  each  base  using  only  odd  chromo¬ 
somes  and  used  these  to  score  motif  occurrences  on  the  (non-overlapping)  set  of 
even  chromosomes,  and  vice  versa  (Supplementary  Fig.  8). 

PRDM9  ZF  length  typing  and  genotyping  of  rs6889665.  To  determine  the 
number  of  ZF  motifs  of  PRDM9  in  a  subset  of  the  samples  used  to  build  the 
map,  published  primer  pairs4  were  used  to  amplify  this  region  (forward: 
5 '  - GGCC AG AA AGT G AAT CC AGG- 3 ' ,  reverse:  5'-GGGGAATATAAGGGG 
TCAGC-3').  Product  lengths  ranged  between  7  and  20  repeats  (801-1,893  bp). 
Four  of  the  166  African-American  samples  did  not  show  an  amplification  product, 
presumably  because  of  insufficient  DNA  quality.  We  also  genotyped  90  YRI  and  90 
CEU  HapMap  samples. 

The  SNP  rs6889665  was  genotyped  in  the  same  samples  using  an  allelic  dis¬ 
crimination  assay  (forward  primer:  5'-aaacttggaacatccatagggt-3',  reverse  primer: 
5'-cgaaaggagaaaagcataatcc-3',  Locked  Nucleic  Acid  (LNA)  probe  ‘C’:  5'-/6-FAM/ 
aGGGat Aaatgaag/BHQ/-3 ' ,  LNA-probe  ‘T:  5 '-/HEX/  AGAGatAaatGaagg/ 
BHQ/-3';  LNA  bases  are  given  in  capital  letters).  Reporter  dyes:  6-FAM,  6- 
carboxyfluorescein;  HEX,  hexachlorofluorescein.  Quencher:  BHQ,  Black  Hole 
Quencher  1.  Only  one  out  of  the  166  African-American  samples  failed  in  this 
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Previous  reports  show  that  obesity  predicts  biochemical  failure  af¬ 
ter  treatment  for  localized  prostate  cancer.  Since  obesity  is  associ¬ 
ated  with  increased  fat  consumption,  we  investigated  the  role  that 
dietary  fat  intake  plays  in  modulating  obesity-related  risk  of  bio¬ 
chemical  failure.  We  evaluated  the  association  between  saturated 
fat  intake  and  biochemical  failure  among  390  men  from  a  previ¬ 
ously  described  prostatectomy  cohort.  Participants  completed  a 
food  frequency  questionnaire  collecting  nutrient  information  for 
the  year  prior  to  diagnosis.  Because  fat  and  energy  intake  are 
highly  correlated,  the  residual  method  was  used  to  adjust  fat  (total 
and  saturated)  intakes  for  energy.  Biochemical-failure-free-sur- 
vival  rates  were  calculated  using  the  Kaplan-Meier  method. 
Crude  and  adjusted  effects  were  estimated  using  Cox  proportional 
hazards  models.  During  a  mean  follow-up  of  70.6  months,  78  men 
experienced  biochemical  failure.  Men  who  consumed  high- 
saturated  fat  (HSF)  diets  were  more  likely  to  experience  biochemi¬ 
cal  failure  (p  =  0.006)  and  had  significantly  shorter  biochemical- 
failure-free-survival  than  men  with  low  saturated  fat  (LSF)  diets 
(26.6  vs.  44.7  months,  respectively,  p  =  0.002).  After  adjusting  for 
obesity  and  clinical  variables,  HSF-diet  patients  were  almost  twice 
as  likely  to  experience  biochemical  failure  (hazard  ratio  =  1.95, 
p  =  0.008)  compared  to  LSF  diet  patients.  Men  who  were  both 
obese  and  consumed  HSF  diets  had  the  shortest  biochemical- 
failure-free-survival  (19  months),  and  nonobese  men  who  con¬ 
sumed  LSF  diets  had  the  longest  biochemical-failure-free-survival 
(46  months,  p  <  0.001).  Understanding  the  interplay  between  mod¬ 
ifiable  factors,  such  as  diet  and  obesity,  and  disease  characteristics 
may  lead  to  the  development  of  behavioral  and/or  targeted  inter¬ 
ventions  for  patients  at  increased  risk  of  progression. 

©  2008  Wiley-Liss,  Inc. 
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The  identification  of  modifiable  factors  that  may  influence  long¬ 
term  outcome  for  prostate  cancer  (PCa)  has  considerable  potential 
to  reduce  morbidity  and  mortality. 1-3  Our  group  and  others  have 
reported  that  obesity  is  associated  with  increased  risk  of  biochemi¬ 
cal  failure  after  treatment  with  radical  prostatectomy4,5  or  external 
beam  radiation6  for  localized  disease.  Since  the  prevalence  of  obe¬ 
sity  in  U.S.  adults  has  reached  epidemic  proportions,  furthering 
our  understanding  of  the  relationship  between  obesity-related  risk 
and  PCa  outcome  has  become  an  increasingly  important  public 
health  issue. 

The  epidemiological  associations  between  high-fat  diets  and 
obesity,7-9  and  higher  fat  consumption  with  increased  PCa  risk 
and  mortality110  have  been  well  documented.  It  has  been  sug¬ 
gested  that  some  types  of  fat  (/.<?.,  monounsaturated)  may  actually 
protect  against  PCa,13-15  whereas  saturated  fat  consumption  has 
been  more  consistently  associated  with  PCa  risk,  especially 
advanced  disease.11'16'  7  To  evaluate  the  role  that  dietary  fat 
intake  plays  in  modulating  obesity-related  PCa  progression,  we 
examined  the  association  between  self-reported  dietary  intake  of 
saturated  fat  and  biochemical  failure  in  a  well-defined  cohort  of 
PCa  patients  treated  by  radical  prostatectomy.5 

Subjects  and  methods 

The  study  population  is  a  subset  of  a  previously  described 
cohort  of  526  patients  at  The  University  of  Texas  M.D.  Anderson 
Cancer  Center.5  All  patients  had  clinically  organ-confined  PCa  at 
time  of  diagnosis  and  were  treated  with  only  prostatectomy.  Due 
to  the  limited  number  of  African-American  and  Hispanic  partici¬ 


pants,  as  well  as  known  racial/ethnic  variation  in  diet,  we  re¬ 
stricted  the  patient  population  to  Caucasians  ( N  =  405).  This  study 
was  conducted  in  accordance  with  the  Institutional  Review  Board, 
and  informed  consent  was  obtained  prior  to  personal  interview. 

Using  standardized  questionnaires,  demographic  information, 
personal  medical  history,  family  history  of  cancer  and  other  risk 
factor  data  were  collected  as  previously  described.5  The  semi- 
quantitative  validated  Block  food  frequency  questionnaire  (FFQ) 
(Health  Habits  and  History  Questionnaire),  modified  to  incorpo¬ 
rate  foods  commonly  consumed  in  the  Southwestern  diet,  was 
used  to  collect  usual  dietary  intake  for  the  year  prior  to  diagno¬ 
sis. ls  Patients  were  asked  to  report  the  average  frequency  of  intake 
(per  day,  week,  month  or  year)  and  usual  portion  size  (/.<?.,  small, 
medium  or  large,  relative  to  a  defined  medium  portion)  for  ~180 
food  items.  Approximately  80%  of  patients  had  the  FFQ  adminis¬ 
tered  within  6  months  of  registration  at  M.D.  Anderson.  We  did 
conduct  a  subset  analysis  and  found  no  differences  in  range  of 
responses  between  those  who  completed  the  FFQ  within  6  months 
and  those  who  completed  it  later.  All  patients  were  instructed  by 
trained  interviewers  to  provide  answers  for  usual  diet  for  the  year 
prior  to  diagnosis.  FFQs  were  reviewed  by  registered  dietitians  for 
completeness  and  acceptability.  Only  patients  who  completed  the 
risk  factor  questionnaire  and  reported  daily  caloric  intake  between 
600  and  5,000  kcal/day  were  included  in  this  study  (N  =  390). 
DIETSYS+Plus  (Version  5.9)  along  with  the  USDA  National  Nu¬ 
trient  Database  for  Standard  Reference  (Release  17)  was  used  to 
calculate  average  daily  intake  of  macro-nutrients  and  micro¬ 
nutrients  for  each  individual. 

Body  mass  index  (BMI,  kg/m2)  was  calculated  from  self- 
reported  height  and  weight.  Obesity  was  defined  according  to  the 
National  Heart,  Lung  and  Blood  Institute  guideline  of  BMI  >  30.0 
kg/m2.  Leisure  time  physical  activity  was  categorized  based  on 
participant  response  to  “the  year  before  your  diagnosis,  how  often 
did  you  do  physical  activities  such  as  jogging,  biking  or  brisk 
walking  (long  enough  to  get  sweaty)?”  Family  history  of  PCa  in 
first-degree  relatives  was  defined  as  PCa  diagnosed  in  father, 
brother  or  son. 

Clinico-pathologic  characteristics  were  abstracted  by  trained 
study  personnel  from  medical  records  using  standardized  forms 
and  included  prostatectomy  Gleason  score,  pathological  stage 
(including  surgical  margin  status  and  seminal  vesicle  involve¬ 
ment)  and  preoperative  PSA  levels.5  Tumors  were  classified  based 
on  pathological  stage  as  pT2  (organ-confined)  and  pT3  (extrapro¬ 
static  extension  +/—  seminal  vesicle  invasion).  Time  to  progres¬ 
sion  was  measured  from  date  of  prostatectomy  to  date  of  1st  de¬ 
tectable  prostate  specific  antigen  (PSA)  test  (>0.1  ng/ml,  bio¬ 
chemical  failure)  or  last  date  the  patient  was  known  to  have  no 
evidence  of  disease  (censor). 
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Statistical  analysis 

Categorical  variables,  such  as  family  history,  history  of  diabe¬ 
tes,  leisure-time  physical  activity,  prostatectomy  Gleason  score, 
margin  status  and  pathological  stage  were  analyzed  using  %2  or 
Fisher’s  exact  tests  to  evaluate  differences  in  the  distribution  of 
the  clinical,  demographic  and  risk  factor  data.  Continuous  varia¬ 
bles,  such  as  age.  BMI,  education  and  dietary  intake,  were  com¬ 
pared  between  groups  using  Student's  f-tests.  Since  total  and  satu¬ 
rated  fat  intake  were  highly  correlated  ( r  =  0.89  and  r  =  0.86, 
respectively,  p  <  0.001  for  both)  with  total  daily  energy  intake, 
we  energy-adjusted  fat  consumption  using  the  residual  method.19 
Energy-adjusted  total  and  saturated  fat  intakes  were  categorized 
into  quartiles  for  initial  analyses.  Since  risk  of  progression  was 
significantly  higher  among  men  in  the  upper  quartiles  of  total  and 
saturated  fat  consumption  (i.e.,  Q4)  as  compared  to  those  in  the 
lower  3  quartiles  (i.e.,  Q1-Q3),  analyses  were  conducted  by 
dichotomizing  intakefhigh  intake  =  Q4.  lower  intake  =  Q1-Q3). 
To  assess  whether  total  fat  or  saturated  fat  was  a  better  predictor 
of  outcome,  parallel  predictive  models  were  constructed.  Total 
energy  intake  (kcal)  was  also  evaluated  as  an  independent  predic¬ 
tor  of  outcome  modeled  as  a  continuous  variable.  The  Gleason 
scores  from  prostatectomies  were  analyzed  in  4  categories:  6,  7  (3 
+  4),  7  (4  +  3)  and  >8.  The  distribution  of  preoperative  PSA  was 
skewed  to  the  right,  therefore,  all  values  were  log-transformed 
prior  to  analysis,  and  PSA  was  analyzed  as  a  continuous  variable. 

Biochemical-failure-free  survival  rates  were  calculated  using  the 
Kaplan-Meier  method,  and  log-rank  tests  were  used  to  evaluate 
statistical  significance.  Univariate  Cox  proportional  hazards  mod¬ 
els  allowed  us  to  evaluate  the  crude  effects  of  each  factor  of  inter¬ 
est.  Variables  with/?  <  0.10  were  evaluated  for  inclusion  in  a  mul¬ 
tivariable  model  that  simultaneously  adjusted  for  all  other  included 
variables.  In  a  forward  stepwise  manner,  the  multivariable  model 
was  constructed;  and  95%  confidence  intervals  were  estimated  for 
all  point  estimates  using  2-sided  testing  (SPSS  version  12.0, 
Chicago,  IL).  The  final  multivariable  model  only  includes  factors 
shown  to  significantly  improve  the  predictive  value. 


Results 

This  subset  of  390  men  was  representative  of  the  previously 
described  larger  cohort  with  respect  to  age  and  clinico-pathologic 
characteristics.5  Table  I  shows  patient  characteristics  for  men  by 
level  of  saturated  fat  intake  [high  saturated  fat  diets  (HSF)  and  lower 
in  saturated  fat  (LSF)].  Compared  to  men  who  consumed  LSF  diets, 
men  who  consumed  HSF  diets  were  younger  (59.4-  vs.  61.2-years- 
old,  respectively;  p  =  0.03)  and  had  higher  BMIs  at  diagnosis  (28.4 
vs.  27.3  kg/m2,  respectively;  p  =  0.03)  (Table  I).  There  were  no  stat¬ 
istically  significant  differences  in  clinico-pathologic  characteristics 
(i.e.,  prostatectomy  Gleason  score,  PSA,  or  pathological  stage),  fam¬ 
ily  history  of  PCa.  education,  history  of  diabetes  or  physical  activity 
between  these  2  groups.  As  expected,  men  consuming  HSF  diets, 
also  consumed  more  calories  (2,292  vs.  2,088  kcal/day,  respectively, 
p  =  0.04)  and  total  fat  (102  vs.  73  g/day,  respectively,  p  <  0.001) 
compared  to  men  who  ate  FSF  diets  (Table  I).  The  top  contributors 
to  daily  intake  of  saturated  fat  for  this  patient  population  were  beef 
steaks,  cheese  and  cheese  spreads,  hamburgers  and  cheeseburgers, 
eggs,  ice  cream  and  salad  dressing/mayonnaise. 

During  the  follow-up  period  (mean  =  97.3  months),  20%  of  the 
patients  with  pathologically  organ-confined  disease  experienced 
biochemical  failure.  Biochemical  failure-free  survival  was  esti¬ 
mated  using  Kaplan-Meier  survival  methods  stratified  by  saturated 
fat  intake  (Fig.  la).  Men  who  ate  HSF  diets  were  significantly 
more  likely  to  experience  biochemical  failure  (p  —  0.006),  and  had 
significantly  shorter  biochemical  failure-free  survival  than  men 
who  consumed  less  saturated  fat  (26.6  vs.  44.7  months,  respec¬ 
tively,/?  =  0.004).  Five  years  after  surgery,  about  65%  of  men  who 
consumed  HSF  diets  had  no  evidence  of  disease  compared  to  80% 
of  men  who  consumed  LSF  diets.  Initial  analyses  of  the  risk  of  pro¬ 
gression  indicated  that  men  in  the  2nd  and  3rd  quartiles  of  energy- 


TABLE  I  -  PARTICIPANT  CHARACTERISTICS 


Variable 

Low  saturated 
fat  (N  =  293) 

High  saturated 
fat  ( N  =  97) 

p-\ alue 

Age  (mean  ±  SD) 

61.2  ±  6.8 

59.4  ±  7.3 

0.03 

Education  (years,  mean) 

15.3 

15.4 

0.89 

+  Family  history 

58  (19.8) 

26  (26.8) 

0.15 

of  PCa  in  FDR1 

BMI  at  Dx  (kg/m2,  mean) 

27.3 

28.4 

0.03 

Diabetes  diagnosis 

1 1  (4.0) 

8  (8.4) 

0.09 

Leisure  time  physical  activity 

1  +  times/wk 

218  (74.4) 

65  (67.7) 

Few  times/m 

33  (11.3) 

10(10.4) 

Rarely/Never 

42 (14.3) 

21  (21.9) 

0.22 

Gleason  score 

6 

73  (24.9) 

30  (30.9) 

7  (3  +  4) 

90  (30.7) 

20  (20.6) 

7  (4  +  3) 

64  (21.8) 

23  (23.7) 

8 

66  (22.5) 

24  (24.7) 

0.27 

PSA  >10  ng/ml 

56(19.6) 

20  (21.7) 

0.65 

+  Surgical  margin 

41 (14.2) 

18(18.8) 

0.28 

pT3/T4 

76  (26.1) 

30  (31.3) 

0.33 

Calories  (kcal/day) 

2087.9 

2292.1 

0.04 

Fat  (g/day) 

73.1 

101.6 

<0.001 

Saturated  fat  (g/day) 

23.4 

37.2 

<0.001 

Unsaturated  fat  (g/day) 

49.6 

64.4 

<0.001 

%  Energy  fat 

31.0 

39.6 

<0.001 

%  Energy  saturated  fat 

9.9 

14.5 

<0.001 

PCa  progression  (%) 

17.7 

26.8 

0.05 

'FDR.  first-degree  relatives. 


adjusted  total  and  saturated  fat  intake  had  no  appreciable  change  in 
risk  compared  to  the  lowest  quartile.  For  this  reason,  fat  intake 
(both  total  and  saturated  fat)  were  dichotomized  as  Q4  vs.  Q1-Q3. 

Using  Kaplan-Meier  methods,  we  evaluated  the  combined 
effects  of  obesity  and  saturated  fat  consumption  (Fig.  1ft).  Men 
who  were  both  obese  and  consumed  HSF  diets  had  the  shortest 
biochemical  failure-free  survival  (19  months),  and  nonobese  men 
who  consumed  LSF  diets  had  the  longest  biochemical  failure-free 
survival  (46  months;  p  <  0.001).  Nonobese  men  who  ate  HSF 
diets  and  obese  men  who  ate  LSF  diets  had  intermediate  progres¬ 
sion-free  survival  times  (29.4  and  41.5  months,  respectively). 
Approximately  85%  of  nonobese  men  on  LSF  diets  were  biochem¬ 
ical  failure-free  at  5  years  after  surgery,  compared  to  70%  obese 
on  LSF  and  about  65%  of  nonobese  and  obese  men  on  HSF  diets. 
The  interaction  between  saturated  fat  intake  and  obesity  was  not 
statistically  significant  (p  =  0.99). 

We  used  Cox  proportional  hazards  models  to  simultaneously 
adjust  for  relevant  clinico-pathologic  variables  in  a  multivariable 
Cox  proportionate  hazards  model  (Table  II).  We  found  that 
energy-adjusted  HSF  diet  remained  an  independent  predictor  of 
biochemical  failure  in  our  final  model;  PCa  patients  who  con¬ 
sumed  HSF  diets  were  almost  twice  as  likely  to  experience  bio¬ 
chemical  failure  compared  to  men  who  ate  less  saturated  fat  (HR 
=  1.98,  p  —  0.006).  Increased  BMI  (continuous)  was  modestly 
associated  with  increased  risk  of  BF  (HR  =  1.05,  p  =  0.05).  Since 
lack  of  physical  activity  may  be  associated  with  increased  BMI 
and  consuming  poorer  diet  (i.e.,  diet  high  in  saturated  fat),  we 
evaluated  the  predictive  utility  of  including  leisure-time  physical 
activity  in  the  multivariable  model;  however,  physical  activity  did 
not  improve  the  overall  fit  of  the  model  and  was  not  included  in 
the  final  model.  Multivariable  analyses  indicated  that  inclusion  of 
energy-adjusted  saturated  fat  intake  explained  a  greater  proportion 
of  variance  as  indicated  by  the  log  likelihood  of  that  model  com¬ 
pared  to  the  model  including  total  energy  intake.  Saturated  and 
total  fat  intake  were  significantly  correlated  (r  =  0.95 ,  p  <  0.001). 
However,  saturated  fat  intake  explained  significantly  more  overall 
variance  in  the  model  compared  to  total  fat.  The  addition  of  total 
fat  intake  into  the  multivariable  model  with  saturated  fat  intake 
had  no  appreciable  impact  on  the  overall  goodness  of  fit  of  the 
model  and  was  not  included. 
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Figure  1  -  (a)  Progression-free  survival 
by  saturated  fat  intake  (low  vs.  high)  (LSF  = 
low  saturate  fat  intake;  HSF  =  high  saturated 
fat  intake),  ( b )  Progression-free  survival  by 
saturated  fat  intake  and  BM1  (Obese  =  BMI  > 
30  kg/m2;  Non-obese  =  BMI  <  30  kg/m2),  (c) 
Mean  time  to  progression  in  months  by  satu¬ 
rated  fat  intake  and  BMI. 


Saturated  Fat 

BMI 

Mean  time  to 
progression 
(months) 

Low 

Non-obese 

459 

Low 

Obese 

41.5 

High 

Non-obese 

294 

High 

Obese 

192 

In  initial  multivariable  models,  energy  intake  alone  was  eval¬ 
uated  as  a  potential  predictor  of  failure.  Parallel  models  incorpo¬ 
rating  the  same  covariates  and  either  total  energy  intake  or 
energy-adjusted  saturated  fat  intake  were  constructed  and  com¬ 
pared;  the  model  with  energy-adjusted  saturated  fat  explained 


more  variance  in  the  data  and  was  better  at  predicting  outcome 
compared  to  the  one  with  total  energy  intake.  In  contingency  table 
analysis,  no  association  was  found  between  energy-adjusted 
saturated  fat  intake  and  total  energy  intake.  The  inclusion  of 
energy  in  the  multivariable  model  neither  significantly  improved 
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TABLE  II  -  MULTIVARIABLE  MODEL  OF  BIOCHEMICAL  FAILURES 


Variable 

Hazard  ratio 

95%  Cl 

Without  energy 

High  saturated  fat  intake 

1.95 

1.19-3.19 

BMI  (kg/m2,  continuous) 

1.061 

1.00-1.12 

With  energy 

High  saturated  fat  intake 

1.90 

1.16-3.11 

BMI  (kg/m2,  continuous) 

1.062 

1.00-1.12 

'Adjusted  for  pathologic  stage,  surgical  margin  involvement  and 
Gleason  score.^Adjusted  for  pathologic  stage,  surgical  margin 
involvement,  Gleason  score  and  total  energy  intake. 


the  fit  of  the  model  nor  affected  the  point  estimates  (Table  II); 
therefore  energy  intake  was  removed  from  the  final  model. 

Discussion 

Our  results  showed  that  high  prediagnostic  saturated  fat  intake 
was  associated  with  a  2-fold  increased  risk  of  biochemical  failure 
in  this  cohort  of  390  Caucasian  men  with  localized  PCa  treated 
with  prostatectomy.  The  multivariable  model  indicated  that  this 
increase  in  risk  of  biochemical  failure  was  independent  of 
the  increased  risk  associated  with  obesity,  and  both  obese  and 
nonobese  men  who  consumed  HSF  diet  had  shorter  biochemical 
failure-free  survival. 

Some  epidemiological  studies  found  a  direct  association 
between  saturated  fat  intake  and  PCa  risk  and  prognosis,  espe¬ 
cially  in  advanced  disease,16  suggesting  that  saturated  fat  may 
play  a  role  in  PCa  prognosis.  However,  not  all  studies  have 
adjusted  for  the  effects  of  total  energy  intake,  and  the  associations 
or  lack  thereof  reported  in  these  studies  may  be  partially  attribut¬ 
able  to  residual  confounding.  Additionally,  our  data  support  the 
findings  reported  by  Meyer  et  al.  that  prediagnostic  HSF  intake 
was  associated  with  increased  PCa  mortality. 11  However,  to  our 
knowledge,  no  studies  have  evaluated  the  combined  effects  of 
both  energy-adjusted  saturated  fat  intake  and  obesity  as  predictors 
of  PCa  progression. 

The  mechanisms  by  which  these  associations  affect  PCa  prog¬ 
nosis  have  not  been  established,  although  some  studies  suggest 
that  alterations  in  insulin  metabolism  may  be  involved.20  In  over¬ 
weight  and  obese  nondiabetic  men,  diets  high  in  saturated  fat  were 
shown  to  induce  insulin  resistance,  which  has  been  suggested  to 
play  a  role  in  prognosis.21  Additionally,  it  has  been  shown  that 
men,  whose  diets  were  highest  in  saturated  fat  had  the  highest  lev¬ 
els  of  IGF-1  and  lowest  levels  of  IGFBP-3  compared  to  men  who 
ate  diets  lower  in  saturated  fat.22  Castrated  xenograft  mice  injected 
with  LAPC-4,  an  androgen-sensitive  PCa  cell  line,  and  fed  an  iso¬ 
caloric  low-fat  diet  had  significantly  lower  serum  levels  of  insulin 
and  IGFBP-1/-2  as  well  as  slower  PCa  progression  compared  to 
similarly  treated  mice  on  high-fat  diet.20 

Another  plausible  mechanism  by  which  saturated  fat  may  influ¬ 
ence  PCa  progression  involves  heterocyclic  amine  consumption 
since  several  key  contributors  to  saturated  fat  intake  (i.e.,  beef 
steaks  and  hamburgers/cheeseburgers)  are  known  to  have  high 
levels  of  heterocyclic  amines.  These  foods  are  often  prepared 
using  high-heat  generating  methods,  such  as  grilling  or  broiling, 
which  has  been  shown  to  significantly  increase  dietary  intake  of 
heterocyclic  amines,  particularly,  2-amino- 1  -methyl-6-phenylimi- 
dazo[4.5-b]-pyridine  (PhIP),  previously  demonstrated  to  have  car¬ 
cinogenic  properties.  Human  prostate  tissue  is  capable  of  activat¬ 


ing  heterocyclic  amines  that  can  then  bind  to  DNA  and  form 
adducts,  which  have  been  associated  with  prostate  carcinogene¬ 
sis.23  Additionally,  PhIP-DNA  adducts  levels,  a  quantitative  mea¬ 
surement  of  PhIP  exposure,  have  been  demonstrated  to  show  an 
association  with  greater  tumor  volume  and  higher  Gleason  score 
among  African-Americans,24  both  of  which  have  been  shown  to 
be  associated  with  PCa  progression.  Higher  PhIP  intake  has  been 
significantly  associated  with  increased  PSA  levels,  which  is  also  a 
predictor  of  PCa  outcome.25  We  were  unable  to  evaluate  heterocy¬ 
clic  amine  consumption  since  information  on  cooking  methods 
was  not  collected;  however,  future  studies  are  being  designed  to 
collect  and  incorporate  these  data. 

Sex  hormone  levels  have  been  shown  to  be  influenced  by  satu¬ 
rated  fat  intake.  Dietary  intervention  studies  in  healthy  men  have 
shown  that  a  low-fat  diet  decreased  androgen  levels  both  in  serum 
and  urine26  and  a  high  fat  diet  increased  plasma  and  urinary  testos¬ 
terone  and  DHEA-S.27  These  results  demonstrate  the  ability  of 
short-term  changes  in  fat  intake  to  directly  affect  the  hormonal  mi¬ 
lieu  known  to  play  a  key  role  in  the  natural  history  of  PCa.28  Over¬ 
all,  the  evidence  suggests  that  saturated  fat  might  affect  PCa  prog¬ 
nosis  through  several  inter-related  mechanisms  and  other  dietary 
components  may  act  in  concert  or  discordance. 

This  study  has  some  limitations.  Nutritional  data  were  collected 
at  the  time  of  study  enrollment,  and  we  do  not  have  quantifiable 
information  about  how  patients  changed  their  diets  since  diagno¬ 
sis.  There  is  potential  for  measurement  error  since  the  FFQ  is 
semiquantitative;  however,  this  error  should  be  minimized  as  we 
used  the  data  from  the  FFQs  simply  to  categorize  men  as  high  or 
low  consumers  of  nutrients  rather  than  compare  absolute  values. 
Our  patient  population  was  limited  to  Caucasians,  as  we  did  not 
have  sufficient  power  to  evaluate  inter-racial/ethnic  variation  in 
dietary  intake  in  conjunction  with  progression  vs.  no-progression. 

On  the  other  hand,  this  study  has  several  strengths.  The  patients 
comprising  our  cohort  were  all  diagnosed  with  clinically  localized 
disease,  received  the  same  treatment  and  did  not  have  adjuvant 
therapy  postoperatively  prior  to  biochemical  failure.  Since  all  par¬ 
ticipants  in  this  study  are  cancer  patients  interviewed  at  baseline 
(i.e.,  prior  to  biochemical  failure),  there  should  be  no  difference  in 
recall  between  patients  who  experienced  biochemical  failure  and 
those  who  did  not.  Restricting  our  patient  population  to  Cauca¬ 
sians  limits  the  effects  of  inter-racial/ethnic  variation  in  food  con¬ 
sumption  patterns  as  well  as  other  lifestyle  and  genetic  differences 
that  may  help  reduce  the  effects  of  confounding. 

These  results  expand  upon  our  previous  finding  that  obesity  was 
associated  with  increased  risk  of  biochemical  failure  following 
prostatectomy,  and  suggest  that  saturated  fat  intake  plays  a  role  in 
PCa  progression.  After  duplicating  these  findings  in  a  larger 
patient  population  from  different  racial/ethnic  groups,  future  inter¬ 
ventions  may  be  designed  to  decrease  consumption  of  dietary  sat¬ 
urated  fat  to  reduce  risk  of  progression  in  PCa  patients  as  has  been 
done  for  breast  cancer  patients.29  It  is  our  hope  that  these  results 
can  be  integrated  into  clinical  practice  to  identify  patients  at  high- 
risk  of  progression  following  definitive  therapy.  Increasing  our 
understanding  of  the  interplay  between  modifiable  factors,  such  as 
lifestyle  ( e.g .,  diet)  and  disease  characteristics,  may  lead  to  devel¬ 
oping  targeted  interventions  for  patients  at  increased  risk  for  bio¬ 
chemical  failure. 
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